[phpBB Debug] PHP Warning: in file /home/biologyonline/public_html/kb/print.php on line 19: include(../common.php): failed to open stream: No such file or directory
[phpBB Debug] PHP Warning: in file /home/biologyonline/public_html/kb/print.php on line 19: include(): Failed opening '../common.php' for inclusion (include_path='.:/usr/share/pear:/usr/share/php')
Print - Biology-Online

Search and Discovery Strategies for Biotechnology: the Paradigm Shift


Search and Discovery Strategies for Biotechnology: the Paradigm Shift

By: Alan T. Bull, Alan C. Ward, and Michael Goodfellow

Research School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ,1 and Department of Agricultural and Environmental Science, University of Newcastle, Newcastle upon Tyne NE1 7RU,2 United Kingdom

Profound changes are occurring in the strategies that biotechnology-based industries are deploying in the search for exploitable biology and to discover new products and develop new or improved processes. The advances that have been made in the past decade in areas such as combinatorial chemistry, combinatorial biosynthesis, metabolic pathway engineering, gene shuffling, and directed evolution of proteins have caused some companies to consider withdrawing from natural product screening. In this review we examine the paradigm shift from traditional biology to bioinformatics that is revolutionizing exploitable biology. We conclude that the reinvigorated means of detecting novel organisms, novel chemical structures, and novel biocatalytic activities will ensure that natural products will continue to be a primary resource for biotechnology. The paradigm shift has been driven by a convergence of complementary technologies, exemplified by DNA sequencing and amplification, genome sequencing and annotation, proteome analysis, and phenotypic inventorying, resulting in the establishment of huge databases that can be mined in order to generate useful knowledge such as the identity and characterization of organisms and the identity of biotechnologytargets. Concurrently there have been major advances in understanding the extent of microbial diversity, how uncultured organisms might be grown, and how expression of the metabolic potential of microorganisms can be maximized. The integration of information from complementary databases presents a significant challenge. Such integration should facilitate answers to complex questions involving sequence, biochemical, physiological, taxonomic, and ecological information of the sort posed in exploitable biology. The paradigm shift which we discuss is not absolute in the sense that it will replace established microbiology; rather, it reinforces our view that innovative microbiology is essential for releasing the potential of microbial diversity for biotechnologypenetration throughout industry. Various of these issues are considered with reference to deep-sea microbiology and biotechnology.

Microbiol Mol Biol Rev. 2000 September; 64(3): 573–606. © 2000, American Society for Microbiology.

Introduction: the search for exploitable biology

Biotechnology is based on the search for and discovery of exploitable biology. The course of biotechnology search and discovery starts with the assembly of appropriate biological materials, moves through screening for a desired attribute and selecting the best option from among a short list of positive screening hits, and culminates with the development of a commercial product or process. When considering this topic some years ago (60), we argued the case for establishing sound microbial taxonomies and the need for a fuller understanding of microbial ecology as means for revealing novelty at both the organismal and property levels. Such a focus, we opined, coupled to the emergence of innovative targeted screening procedures, would continue to deliver the sought-after novelty required by biotechnology-based industries.

The concept of exploitable biology outlined above remains valid and continues to be the paradigm for industrial practice overall. However, the scientific and technological advances of the past decade are revolutionizing the approaches to exploitable biology such that the process is undergoing a major reevaluation and in many cases is being supplanted by new strategies. The intention of this review is to appraise the paradigm shift that is happening in search and discovery as a consequence of the bioinformatics revolution and to consider some of the opportunities and challenges that it presents for biotechnology. As a prelude to this appraisal, it is timely to take stock briefly of the current position of biotechnology and more comprehensively of biodiversity and of the resource provided by natural products.

The take-up of modern biotechnology over the past 25 years has been typical of any new technology: a slow initial phase followed by a period of rapid growth (selectively in biotechnology, where it has occurred predominantly in the health care sector) and entry into a mature phase of consolidation and penetration. Thus, biotechnology currently can be defined as a robust, reliable, and relatively low risk technology (current debates on genetically modified organisms notwithstanding) and capable of being implemented on a large scale and across the full range of industrial sectors. Recent estimates of biotechnology markets, expressed as the shares of worldwide biotechnology-related sales, and forecasts for 2005 are shown in Table 1 for seven major industrial sectors. The impact of biotechnology to date has been most pronounced in the pharmaceuticals sector, but it is clear that enormous potential exists in all of the other sectors for biotechnology penetration even though short-term forecasts show no change in those sectors in which the market share is very low.
The principal drivers of biotechnology are economic demand, led by industry; national and international policies, often prompted by public pressure; and advances in science and technology. Together they catalyze the development of biotechnology as a means of generating new markets, resolving long-standing and emerging problems, and gaining cost and efficiency improvements in industrial processing. Biotechnology is a prime example of a radical innovation in the sense that it provides completely new technology with which to reinvigorate extant industries and generate new ones. Its versatility is so great that industries that have not previously used biological systems in their operations are now exploring such options. Biotechnology is recognized universally as one of the key enabling technologies for the 21st century, and confidence in this view stems from its position as a radical innovation, the impact that it has had and will have on major global problems (disease, malnutrition, and environmental pollution), the promise it holds for achieving industrial sustainability (optimal use of renewable resources, amelioration of global warming, and introduction of clean or cleaner products and processes), and the increasing realization that it has become a mature technology capable of achieving economic competitiveness, generating new markets, and having wide industrial applicability (61, 438, 439).
With the exception of large animals and plants, knowledge of biological diversity in terms of species richness, local and global distribution, and ecosystem function remains very incomplete. During the decade spanning the publication of Biodiversity (491) and Biodiversity II (384), the number of described species has risen by 34% to 1.87 million, of which approximately 78% are terrestrial organisms (383) and approximately 8% are microorganisms. The accuracy of these figures varies for different taxonomic groups. However, much greater uncertainty accompanies attempts to estimate the numbers of undescribed species. Arguably the best estimates remain those of Hammond (196, 197), who provides a “working figure” total of around 12.5 million species and approximately 1.9 million microorganisms. Hammond comments: “The figures provided for viruses, bacteria and algae are frankly speculative, whereas those for fungi, protozoansalso remain very insecurely based.” The speculative nature of such estimates also extends to faunal diversity (for example, nematodes [284] and microarthropods [12]), a matter of some significance in estimating microbial diversity in view of the probable symbiotic associations in which they are involved.

Our comprehension of microbial diversity has changed radically as a result of analyzing the DNA present in ecosystems. The most dramatic insight into the scale of this diversity came from Vigdis Torsvik and her colleagues in Bergen, who deployed DNA reassociation kinetics to measure genetic diversity. In two seminal papers published in 1990, Torsvik reported that about 4,000 completely different bacterial genomes could be detected in a beech forest soil, a value some 200 times greater than the corresponding diversity of strains isolated (444, 445). The identity of this newly revealed diversity has largely been achieved by small-subunit (ss) ribosomal DNA (rDNA) sequencing, which can be determined from DNA isolated directly from the environment and which has allowed evolutionary relationships to be inferred. The original circumscription of the domain Bacteria based on rDNA sequences (498) identified 11 divisions (“a lineage consisting of two or more 16S rRNA sequences that are reproducibly monophyletic and unaffiliated with all other division-level related groups [223]). The pace of discovery has been so extraordinary that in just over a decade the number of recognized and putative divisions of bacteria has risen to 36 (223). An illustration of this explosive discovery is the division Acidobacterium; in the two years following its designation as a division, 250 rDNA sequences have been reported that define at least eight major subdivisions. It has been claimed that the presumptive metabolic and genetic diversity of members of Acidobacterium and their widespread distribution make it as ecologically important as other divisions of bacteria such as the Proteobacteria (30). In a remarkable study of the Obsidian Pool hot spring in Yellowstone National Park, Pace and his colleagues (224) defined six new candidate divisions; sequences of one division (OP11) have also been recovered from soil, sediment, and deep subsurface ecosystems. Telling facts, however, are that more than a third of the 36 divisions of bacteria contain no organisms that have been cultured and only a third are represented in genome sequence projects (359).

Detection of major taxonomic diversity in the other prokaryotic domain, Archaea, has also been reported. The archaeal phylogenetic tree bifurcates into the principal divisions Crenarchaeota and Euryarchaeota, but recently a third, most deeply located division, the Korarchaeota, was proposed on the basis of rDNA sequence analysis of uncultured organisms also found at the Obsidian Pool site (28). At one time Archaea was thought to comprise mainly extremophilic organisms (hyperthermophiles, extreme halophiles, and strict anaerobes), but archaea are known now to be abundant in aerobic marine and fresh waters (100) and in tidal sediments (335).

It would be misleading to suggest that the discovery and predicted extent of novel microbial diversity are restricted to the prokaryotic domains. The currently accepted figure for the approximate numbers of described species of fungi is 72,000, but in the absence of a world checklist of accepted fungi, as many as 150,000 species may already have been described (202). The “working figure” of 1.5 million estimated species of fungi can be regarded as moderately accurate, i.e., within a factor of 5 (198). Likely major sources of undiscovered species richness are ectoparasitic ascomycetes of the order Laboulbeniales and nonmycorrhizal endophytic species. Of the former group, about 2,000 species are known that have a high level of host specificity. New species and genera continue to be reported from a wide geographic range and from additional families of arthropods. Based upon recent datasets obtained from Sulawesi and elsewhere, Weir and Hammond (476) estimate that the figure for Laboulbeniales species parasitizing coleopteran hosts is between 10,000 and 50,000, while a smaller number (less than half) may be found on other arthropod hosts. Endophytic fungi have been much less intensively researched than arthropod ectoparasites but are being found in the roots, stems, and leaves of a large diversity of plants, including grasses, orchids, shrubs, and trees (2, 33, 76, 331). Many of these fungi have not been identified, and evidence is appearing that, in turn, they may reduce the diversity of plant communities (77). That endophytes might be a source of novel compounds was given considerable credibility through the discovery of taxol synthesis by endophytic fungi and a variety of other antibacterial, antifungal, and anticancer metabolites (427). A recent report of thermophilic and thermotolerant fungi isolated from geothermal soils (386) suggests that such ecosystems may contain further eukaryotic novel microbes with exploitable biotechnology potential.

Algal and protozoal diversities are about 40,000 species in each case, but the working figure estimates of 400,000 and 200,000 species, respectively, are given a very poor accuracy rating, i.e., not within 10-fold (198). Confidence in predicting significantly more algal species than are recognized now is based upon the annual rate of new species descriptions, the large geographical areas that to date have been only poorly explored phycologically, and the morphological similarity that frequently masks genetic diversity, notably among coccoid picoplankton (373). The prokaryotic picoplankton has been researched extensively in terms of its genetic diversity and phylogeny (133, 366). In contrast, taxonomic assessment of the eukaryotic picoplankters is less advanced, and most of those described in the last 10 years represent novel species, genera, orders, and classes (e.g., Pelageophyceae [10] and Bolidophyceae [192]). The protozoa present similar uncertainties on species richness and of taxonomic effort, but about 360 new species are being described annually. Consideration of the Ciliophora (ciliates) illustrates the situation: the species richness is about 8,000 (86), of which at least 2,000 are soil ciliates (138), but of the latter only about 600 have been described. Recent studies of African soil ciliates revealed over 500 species, of which 47% had not been described (140).

Even the most cursory glance at the literature illustrates the pace and range of new microorganism discovery: completely novel bacteria being found in such commonplace environments as activated sludge (7, 178, 347), caves (191), and the human gut (430); novel rickettsial endosymbionts in common soil and water amebae (145); and high bacterial and genetic diversity in deep-sea sediments (79, 80, 290, 382). However, this brief survey also raises several general issues of importance for the biotechnology search activity.

Where to look, how to look. Very often insufficient thought is given to the design of sampling strategies. Random sampling of ecosystems is preferable to “representative” sampling that is subject to investigator bias (441). Similarly, analysis of randomly selected samples is likely to yield a more complete picture of an ecosystem's microbiota than nested samples from the same environment. Careful observation of the ecosystem and direct examination of environmental samples usually pay dividends in terms of detecting the microbiota that is present. Unquestionably, molecular biological approaches based on sequence libraries from environmental DNA have opened up new vistas on microbial diversity, but it needs to be emphasized that such surveying does not always detect organisms shown to be present by selective isolation procedures (cf. the actinomycete diversity of Pacific Ocean deep-sea sediments revealed by selective isolation [80] with that from 16S rDNA clone libraries [290, 452]). The pitfalls of relying on PCR-based rRNA analysis as a measure of microbial diversity in environmental samples have been emphasized by von Wintzingerode et al. (462). And finally, can we judge how successful the recovery of organisms or ss-rDNA sequence libraries from particular samples or sites has been? One useful approach is to plot the cumulative number of operational taxonomic units (strains or rDNA clones) as a function of their appearance during the sampling of strains or clones, i.e., adopt the rarefaction analysis used by macroecologists. An elegant demonstration of this approach is reported by Polz et al. (372) in a study of epibiotic communities of bacteria on a marine nematode.

Taxonomy is not a luxury. In particular, α-taxonomy (the earliest stage in the development of a classification) (227), which designates species richness (or α-diversity) within an ecosystem, is not mere “stamp collecting”: such inventorying determines what biodiversity is present and how it can be accessed and becomes an integral part of a database on the functionality of that ecosystem, all of which has a major bearing on the success or otherwise of search and discovery programs. Taxonomy exists in a dynamic state. Thus, classifications that have been based upon limited phenotypic, morphologic, and genetic criteria are changing, often radically, as new phylogenetic data become available. Such revisions are evident not only at lower taxonomic levels but also at division (e.g., pseudomonads [261]) and order (e.g., Chlamydiales [121, 398]) levels. Gene sequencing studies can also be used to resolve the phylogenetic position of so-called enigmatic organisms. In recent years the putative protozoan Epulopiscium fishelsoni has been proved to be an unusually large bacterium (13), the putative alga Prototheca richardsi has been demonstrated to be a member of a newly recognized clade near the animal-fungal divergence point (24), and microsporidia appear to be related to fungi rather than being early-diverging eukaryotes (213).

Microbiology is about organisms. Several authors have commented recently on the use—or misuse—of rDNA sequence data as the sole descriptor for establishing a taxon (336) or for suggesting that a single molecular marker can serve to reveal phylogenetic relationships of bacteria (162). The risks of generating artifacts when analyzing rDNA sequence data obtained from environmental samples have been highlighted (298). It is timely, therefore, to reaffirm the value of polyphasic taxonomies in which molecular biological data complement but do not supplant other (phenotypic) information (176, 454). It is debatable even whether genome sequencing projects will enable us to adduce organism behavior, physiology, or functions in an ecosystem or culture, which is just the sort of information required by the biotechnologist. Recall that over a third of the currently defined divisions of bacteria do not have representatives in laboratory culture.

Microbiology research is focused on too few species. In a recent survey of publication patterns, Galvez et al. (153) reported that little or nothing had been published on 17.5% of formally described bacteria between 1991 and 1997 and that the publication rate on another 56% was very low. It is a reasonable assumption that the position is even more extreme with regard to other groups of microorganisms. Clearly there are benefits to be gained from this very focused approach to microbiological research, but one adverse effect is the distorted picture it presents of microbial diversity. We reiterate that one serious effect of this selective activity is the marginal effort being put on the cultivation of representatives of the new candidate divisions of bacteria so that their physiologies can be determined with a view to exploiting them for biotechnological purposes.

Data integration is a desideratum. Although there are more than 200 microbiology-related databases (441), it is difficult if not impossible to find answers to questions that rely on the use of integrated information from even a few databases. The situation is made more unsatisfactory by the variable quality and completeness of certain data. An integrated microbial database (IMD) containing taxonomic, phylogenetic, sequence, metabolic, physiological, and ecological data would enable fundamental questions to be posed—interrogation of such an IMD should yield understanding (knowledge), not simply factual material (data). A prototype IMD project was launched by the Center for Microbial Ecology at Michigan State University in 1997 (286). An excellent exposition of data and information management is given by Olivieri et al. (354).

This synoptic view of microbial diversity, however selective and incomplete, does demonstrate emphatically that present knowledge is astoundingly poor and that the extent and importance of microbial diversity are only now starting to be appreciated by biotechnologists. Ironically, however, it reinforces the opinion which we and others hold that natural organisms continue to provide a treasure house of innovation for the biotechnology industries: we examine the basis for this optimism in the following section.

Natural-Product Diversity
The search for and exploitation of natural products and properties have been the mainstay of the biotechnology industries. Natural-product search and discovery, however, is not synonymous with drug discovery, albeit the latter holds pole position. All the available evidence points to natural-product discovery continuing strongly and accelerating as a consequence of new search strategies and innovative microbiology (75, 105, 349). In drug discovery, for example, novel natural-product chemotypes with interesting structures and biological activities continue to be reported. Without such discoveries “there would be a significant therapeutic deficit in several important clinical areas, such as neurodegenerative disease, cardiovascular disease, most solid tumors, and immune-inflammatory diseases” (349).

Newman and Laird (345) have analyzed the 10 top-selling drugs of the world's top 14 companies for the latest available sales figures (1997) and categorized them as biologicals (isolated directly from source), natural products (chemically identical to the pure natural product), and derived natural products (chemically modified). Biologicals accounted for 5.8% (0 to 19.7% spread between companies) of sales, and natural plus derived natural products accounted for 28.2% (8.6 to 73.9% spread) of sales. Of the 25 top-selling drugs, 42% were natural and derived natural products. Antibiotics remain the largest market of naturally derived drugs (67% of sales). Significantly, however, the reported discovery of microbial metabolites with nonantibiotic activities has increased progressively over the past 30 years and now exceeds that of antibiotic compounds (212).

One prerequisite to natural-product discovery that remains paramount is the range and novelty of molecular diversity. This diversity surpasses that of combinatorial chemical libraries and consequently provides unique lead compounds for drug and other developments. Newly discovered bioactive products do not usually become drugs per se (345, 449) but may enter a chemical transformation program in which the bioactivity and pharmacodynamic properties are modified to suit particular therapeutic needs. Several reviews are available that detail important recent developments in this field (75, 212, 271).

Once a biotechnological target has been identified, two questions follow. First, what might be the best-producing organism or group of organisms to investigate? Second, what screening procedure should be used in order to elicit the desired activity or property? The following approaches are among those used for organism selection: (i) play the percentage game, e.g., actinomycetes for biopharmaceutins (35); (ii) make reference to taxon-chemistry and taxon-property databases (for example, bacteria-antibiotics [G. Garrity, personal communication] and polyunsaturated fatty acids-algae [460a]) and creativity indices, i.e., the ratio of known metabolites to species richness of a particular taxon (112); (iii) focus on novel and neglected taxa, examples of which are evident in the previous section of this review; (iv) highlight isolates from unusual or little-explored ecosystems, e.g., mycoparasites (485); and (v) match the target with members of previously unscreened but known taxa, e.g., the human immunodeficiency virus (HIV)-inactivating protein cyanovirin-N as a result of screening cyanobacteria (49, 50).

This is not the place to discuss the extensive subject of screening other than in a superficial way. However, it can be noted that considerable effort has been and is being expended in the development of screening assays, particularly as a response to the need to evaluate large numbers of samples in high-throughput screens and the expectation that many new targets will be identified in the wake of genome sequencing projects (see below). High-throughput screening involves the robotic handling of very large numbers of candidate samples, the registering of appropriate signals from the assay system, and data management and interpretation. However, the advent of high-throughput screening, whereby lead discoveries may be identified in a matter of days from libraries of 103 to 105 compounds (416), may be limited by the provision of sufficient quantities of the assay components. The development of surrogate hosts provides one possible means of alleviating such bottlenecks. Hill et al. (212) recently reviewed the range of screens used in the search for biopharmaceutins and the success achieved with enzyme inhibition, receptor binding, and cell function assays.

There is a strong view that biopharmaceutin leads are more likely to be detected in cell function assays than in in vitro assays (210). In this context, construction of surrogate host cells for in vivo drug screening is an interesting development. For example, the ability of Saccharomyces cerevisiae to express heterologous proteins makes it an attractive option; its use in screens based on substitution assays, differential expression assays, and transactivation assays is proving to be an effective route to drug discovery. The procedures involved and the future for S. cerevisiae as a tool for targeted screening have been discussed recently by Munder and Hinnen (334).

As we have pointed out, exploitable biology goes well beyond drugs: novel crop protection agents, food and feed ingredients, biocatalysts, and biomaterials are among the many important industrial targets (61). Industrial biocatalysis, in particular, has developed as a major sector, with applications ranging from biotreatment of wastes and toxic chemicals, detergent additives, processing of materials such as pulp, paper, and leather, and the provision of a plethora of stereo- and regioselective transformations. Moreover, a decisive advantage of developing enzymes as industrial catalysts is their cleanliness compared to most chemical catalysts (59). The further penetration of biocatalysis into industry will depend on the discovery of novel natural enzymes and the modification or de novo design of catalysts from known activities (59, 306). Among the armamentarium of new biocatalysts are the so-called extremozymes, such as thermozymes. The latter have evolved in archaeal and bacterial thermophiles and hyperthermophiles and display high resistance to thermal and chemical denaturation; they can be expected to become the biocatalysts of choice in a variety of new bioprocesses and to be used in upgrading existing ones, such as sugar production from starch (88). The archaeal and bacterial extremophiles present an exciting biotechnological resource which, to a large extent, has been appreciated only during the past decade. The most recent account of extremophile taxonomy (276) records 23 genera and 56 species of hyperthermophilic archaea, 35 genera and 83 species of thermophilic bacteria, 12 genera and 35 species of extreme halophilic archaea, 44 genera and 68 species of halophilic bacteria, and 19 genera and 41 species of alkaliphilic bacteria.

It will be obvious from the foregoing discussions that definitive characterization of organisms is a crucial act in the search for natural products, and the ability to dereplicate strains avoids duplication of effort (see below) (“dereplication” is defined as the ability to prevent isolations of identical species or strains of microorganisms and the repeated recovery of identical natural products). Moreover, it is important to discriminate strains at the infraspecific level. The genetic diversity within a species frequently determines the capacity to produce secondary metabolites and enzymes, and hence it needs to be identified in collections of candidate organisms. Finally, of course, dereplication of natural products per se also is extremely important, and the discussion by VanMiddlesworth and Cannell (455) is a useful starting point for the interested reader.

The Paradigm Shift
Currently we are witnessing a major change in the way which we do search-and-discovery research in biotechnology. This change is so profound that it merits description as a paradigm shift. The term paradigm is used increasingly—and often indiscriminately—in a multiplicity of contexts. Thomas Kuhn's conception was of “an entire constellation of beliefs, values, techniques and so on shared by members of a given community” (277) that define an intellectual discipline which distinguishes it from all other disciplines. Over the succeeding years, the term paradigm has been assigned an additional meaning: “the set of axioms, assumptions, or fundamentals that enable us to create a ‘meaningful’ order. It is very much like a map of realitynot reality itself, but the directions we use to find our way. Thus, the term indicates on the one hand the experiments, or set of procedures, that every member of the scientific discipline learns to appreciate as a necessary methodology to sustain the quality of scientific research; on the other hand, [it] has the broader meaningassociated with a fundamental belief system or map of reality: the lenses through which one sees everything” (322). In more practical terms, it can be defined as “a set of firm theoretical foundations, successful comparisons with past empirical observations [and] triumphal applications to solve important problems” (475). Thus, a paradigm shift demands a major reorientation of methodology so that old questions may be approached anew.

The paradigm in exploitable biology has shifted from what we refer to as traditional biology to bioinformatics.

Traditional biology. In traditional biology the search strategy is based upon specimen collection, system observation, and laboratory experimentation in order to organize knowledge in a systematic way and to formulate concepts. Outcomes of this approach might be illustrated by the serendipitous discovery of antibiosis or the later targeted development of enzyme inhibitor screens (450).

Bioinformatics. In bioinformatics the search strategy is based upon data collection and storage and the mining (retrieval and integration) of the databases in order to generate knowledge, i.e., generation of knowledge (the understanding of what is important about a situation) from information or data (the sum of everything we know about that situation) (23). Outcomes from this approach will include the identification of new drug targets via functional genomics.

The paradigm shift is being actuated by a number of key factors: (i) the phenomenal pace of technological advances, e.g., bioinformatics, combinatorial syntheses, high-throughput screening, and laboratories on a chip; (ii) the need for significant breakthrough discoveries; (iii) pressure to reduce costs; (iv) the requirement to reduce cycle times; and (v) biotechnology acquisitions and mergers, i.e., survival in global markets (283). Bioinformatics databases include DNA (genomes), RNA, and protein sequences, proteomes, macromolecular structures, chemical diversity, biotransformations, metabolic pathways (metabolomes), biodiversity, and systematics. Thus, innovative “experiments” can be made in silico rather than in vivo or in vitro, so that only essential experiments need be undertaken. Kuhn argued that “Paradigms gain their status because they are more successful than their competitors in solvingproblems that the group of practitioners has come to recognize as acute” (277). A major objective of this review is to examine the bioinformatics paradigm with respect to its success in search and discovery, focusing on four components of the paradigm: systematics, genomics, proteomics, and ecology.

The bioinformatics paradigm

Selective Isolation and Characterization of Novel Microorganisms
Analysis of DNA extracted from environmental samples has shown that molecular genetic diversity is much greater in natural habitats than was previously recognized (117, 118, 205, 355, 360, 468a, 471). Such studies show that there are many microbial taxa to be discovered and isolated in pure culture. Despite the inherent problems faced in selectively isolating and characterizing microbes from environmental samples, steady progress continues to be made, as exemplified by advances made in unravelling the systematics of extremophiles (169, 237, 276), lactic acid bacteria (21), legume nodule nitrogen-fixing bacteria (87), rhodococci (172), sphingomonads (116), microbial pathogens of insects (225, 377), and protozoa (85). Nevertheless, substantial difficulties remain in sampling and characterizing representative members of the microbial populations found in natural habitats.

The spatial distribution of microorganisms in soil (200) and the need to overcome a range of microbe-soil interactions (426) are serious limitations to quantitatively and representatively sampling soil microorganisms (352). Procedures used to promote the dissociation of microorganisms from particulate matter include the use of buffered diluents (348), chelating agents (300), elutriation (219), mild ultrasonication (379), and repeated homogenization of soil in several buffers followed by separation of extract from residue (122); these procedures address the problems outlined above to varying degrees. Several of these physicochemical procedures were incorporated into a multistage dispersion and differential centrifugation procedure (220) that was shown to be effective for representative sampling of bacteria, including actinomycetes, from soils with different textures (220, 300).

The dispersion and differential centrifugation (DDC) method has been shown to be 3 to 12 times more effective in extracting actinomycete propagules from a range of soils than the standard procedure of shaking soil in diluent (17). There was also evidence that representatives of different streptomycete taxa were isolated at different stages of the extraction procedure and that certain organisms were only found on isolation media seeded with inocula obtained by using the DDC procedure. These observations suggest that persistent associations between soil particles and actinomycete propagules may be one of the major limitations to quantitative and representative sampling of actinomycete communities in soils and that the DDC method can be used to effectively break down such interactions.

The technique of extinction (or dilution) culture also warrants greater attention from microbiologists wishing to isolate microorganisms from, in particular, oligotrophic habitats. The theory and practical procedures of extinction culture were developed by Don Button and his colleagues (65) in attempts to recover numerically abundant but difficult to culture marine picobacteria. Cultures are produced by diluting the original environmental sample to near extinction of the ability to grow; sterilized seawater provided both the diluent and the culture medium in Button's experiments, but organic amendments may be added, or other appropriately dilute media may be used. The technique has two important advantages: it provides a means of studying organisms that may be abundant in a particular habitat but, because of their oligotrophic nature, are outcompeted by kinetically more versatile organisms in conventional enrichment methods, and dilution to extinction offers the prospect of isolating pure cultures of organisms. In the latter regard, extinction isolation culture is a valuable method for obtaining pure cultures of marine bacteria that frequently grow poorly on solid media and of oligotrophic microorganisms. For recovering marine oligobacteria, Button et al. (65) recommended the use of unamended sterilized seawater and monitoring the developing populations at least three times a week over a 9-week period. Growth should be evaluated with sensitive techniques such as epifluorescence microscopy and flow cytometry. Examples of the successful use of extinction culture are few, but the work of Schut et al. (408) on the marine ultramicrobacterium Sphingomonas sp. strain RB2256 and Button et al. (64) on Cycloclasticus oligotrophus (see later section) are model investigations of this type.

Another constraint on quantitative and representative sampling of microorganisms from natural habitats is the lack of suitable selective isolation procedures. The selectivity of isolation media is influenced by nutrient composition, pH, and the presence of selective inhibitors, as well as by other incubation conditions. Innumerable medium formulations have been recommended for the selective isolation of microorganisms, but the ingredients have been chosen empirically, and hence the basis of selectivity is not clear (281, 489). It is now possible, using computer-assisted procedures, to objectively formulate and evaluate selective isolation media (60). Indeed, numerical taxonomic databases, which contain extensive information on the nutritional, physiological, and inhibitory sensitivity profiles of the constituent taxa, are ideal resources for the formulation of new selective media designed to isolate rare and novel organisms of biotechnological importance.

The streptomycete database generated by Williams et al. (488) has been used to formulate isolation media designed either to favor the growth of members of uncommon Streptomyces species known to be promising sources of new bioactive compounds or to inhibit the growth of the ubiquitous Streptomyces albidoflavus, which tends to predominate on standard media used for the selective isolation of streptomycetes (460, 490). It was apparent from these studies that a medium based on raffinose and histidine as the major carbon and nitrogen source, respectively, led to the predictable reduction in the numbers of S. albidoflavus strains on isolation plates, thereby facilitating the growth of rare and novel streptomycetes. In a continuation of these studies, large numbers of two putatively novel streptomycete species were isolated from hay meadow plots at Cockle Park Experimental Farm, Northumberland, United Kingdom (17).

Another way of optimizing the search and discovery of new bioactive compounds is to ensure that organisms growing on selective isolation plates represent novel or previously uninvestigated centers of taxonomic variation (177). The choice of organisms for pharmacological screening programs, especially those with a low throughput, is primarily a problem of distinguishing among known organisms and recognizing new ones. It is now relatively easy to detect rare and novel microorganisms due to the increasing availability of sound classifications based on the integrated use of genotypic and phenotypic data (85, 176, 239, 454). This approach, which is known as polyphasic taxonomy, was introduced by Colwell (82) to signify successive or simultaneous studies on groups of organisms using a combination of taxonomic methods designed to yield good-quality genotypic and phenotypic data. A range of powerful methods are available for the acquisition of taxonomic data (Table 2).

The polyphasic approach to the detection of rare and novel taxa of biotechnological importance only became practicable with the availability of rapid data acquisition procedures, improved data handling systems, and associated microbiological databases (66, 67). The application of polyphasic taxonomy has led to profound changes in bacterial systematics, especially with respect to industrially significant groups, such as the actinomycetes, for which traditional taxonomies based on form and function made it impossible to select a balanced set of strains for industrial screens (172a, 175). The reclassification of several actinomycete taxa, notably the genera Microtetraspora (508), Mycobacterium (472), Nocardia (175), Rhodococcus (172a), and Streptomyces (262), and the delineation of new actinomycete genera, such as Beutenbergia (191), Ornithinicoccus (190), Tessaracoccus (312), and Williamsia (246), are all products of the polyphasic approach. Similarly, a host of new actinomycete species, for instance, Amycolatopsis thermoflava (74), Gordonia desulphuricans (264), Nocardioides nitrophenolicus (506), and Streptomyces thermocoprophilus (262), have been described using a combination of genotypic and phenotypic data. Corresponding integrated approaches are increasingly being used to circumscribe protozoal (139) and fungal (53, 239, 326), taxa, notably yeasts (393, 435).

Polyphasic taxonomy is now well established, though little attempt has been made to recommend which methods are the most appropriate for generating consensus classifications. At present, polyphasic taxonomic studies tend to reflect the interests of the individual research groups and the equipment and procedures they have at their disposal. It is not possible to be too prescriptive about the methods which should be used, as those selected need to reflect the taxonomic ranks under consideration (Table 2). However, it is clear that small-subunit rRNA is a powerful tool for highlighting new centers of taxonomic variation (56, 85, 195, 498), though the technique does not always allow the separation of members of closely related species. In contrast, DNA-DNA relatedness, molecular fingerprinting, and phenotypic studies provide valuable data for the detection of groups at and below the species level (418, 473).

The polyphasic approach to circumscribing microbial taxa can be expected to meet several of the primary challenges facing microbial systematists, notably the need to generate well-defined taxa, a stable nomenclature, and improved identification procedures. However, most of the methods used in such studies are demanding in terms of time, labor, and materials and hence fail to meet the requirements for the rapid and unambiguous characterization of large numbers of isolates. These requirements are crucial steps in screening for natural products or biocatalytic activities of industrial interest. In this context, the ability to exclude previously screened organisms and to recognize microbial colonies on primary isolation plates that have developed from identical environmental propagules (dereplication) (60) greatly assist the selection of biological material for large commercial screening operations.

It is also important for screening programs to discriminate between microorganisms at the infraspecies level, that is, to examine the genetic diversity within a defined species, as it is well known that the capacity to produce primary and secondary metabolites is frequently a property expressed by members of infraspecific taxa rather than species per se (60). Some widely used molecular techniques, such as small-subunit rRNA gene sequencing, lack the power to distinguish between strains below the species level or between members of recently diverged species (79, 141), while others that have this resolving power (amplified and restriction fragment length polymorphisms and single-strand conformation polymorphism) are laborious and time-consuming.

Given the objectives and constraints outlined above, the ideal procedure for microbial characterization should be universally applicable, require small, easily prepared samples, provide rapid and highly reproducible data, be capable of automation, and handle high throughputs. All of these requirements are provided by physicochemical whole-organism fingerprinting methods (173, 303), the most widely employed being Curie point pyrolysis mass spectrometry (PyMS). Other methods of this type are Fourier-transform infrared spectroscopy (FT-IR) and dispersive Raman spectroscopy; the three procedures have been compared recently for the phenotypic discrimination of urinary tract pathogens (172).

Curie point PyMS has been shown to be of value in rapidly grouping microorganisms isolated from environmental samples (92), for defining pyrogroups (clusters) of commercially significant actinomycetes (132, 399), and for recognizing subtle phenotypic differences between strains of the same species (171). Good congruence has been found between numerical phenetic, molecular fingerprinting, and PyMS data, as exemplified by a polyphasic study on clinically significant actinomadurae (446). Similarly, it has been shown that the taxonomic integrity of three putatively novel species of Streptomyces highlighted in a polyphasic study was supported by PyMS data (17). These observations make it possible to develop an objective strategy to determine the species richness of cultivable streptomycetes isolated from natural habitats. Thus, putatively novel streptomycetes can be grouped together on the basis of their easily determined pigmentation characteristics, and the taxonomic status of the resultant color groups can then be determined by characterizing selected strains by PyMS and comparing the pyrogroups with the original color groups. If required, more exacting taxonomic studies can be carried out on representative strains using more sophisticated procedures, notably small-subunit rRNA sequencing.

A strategy similar to the one outlined above was used to circumscribe novel, industrially significant rhodococci selectively isolated from deep-sea sediments in the northern Pacific Ocean close to Japan (79, 80). Subsequently, excellent congruence was found in double-blind numerical phenetic and PyMS analyses of representative rhodococcal isolates, indicating that the delineated pyrogroups were directly ascribable to the observed phenotypic variation and, in consequence, of real value in screening programs (81). The results of this study affirmed the value of PyMS in characterizing microorganisms, discriminating organisms at the infraspecies level, and enabling rapid and effective dereplication of strains prior to screening. This approach can be applied directly to target strains growing on isolation plates, thereby obviating the requirement for time-consuming laboratory testing to distinguish duplicate colonies and permitting the rational collection of colonies from such plates for subsequent screening. These attributes, coupled with the speed of analysis (approximately 2 min per sample), the very small sample size required (50 to 100 μg), the high reproducibility, and the high automated throughput, commend PyMS as a method of choice for industrial screening programs based on microorganisms.

Detection of Uncultured Prokaryotes: Molecular Approaches
Traditionally, members of established and novel microbial taxa isolated from natural habitats were recognized using phenetic methods which drew upon available genotypic and phenotypic data. An alternative approach to the estimation of prokaryotic diversity in natural habitats was initiated by the application of molecular methods (355, 360), most of which allowed the recognition of uncultured organisms based on the use of 16S rRNA sequences. It was apparent even from the initial studies that spectacular patterns of prokaryotic diversity had gone undetected using standard cultural and characterization procedures. The molecular approaches also confirmed observations from direct microscopy that the number of prokaryotes which can be readily cultivated from environmental samples is only a small and skewed fraction of the diversity present (471). The inability to cultivate even the most numerous microorganisms from natural habitats has been referred to as the “great plate count anomaly” (423).

Several procedures have been used to estimate prokaryotic diversity based on the examination of DNA extracted from environmental samples (118, 205, 352). Environmental DNA samples have been analyzed using reassociation kinetics to estimate community complexity and the number of constituent genomes (444, 445), but the procedure lacks the precision to identify individual genomes or to place them within a hierarchical taxonomic framework. In contrast, analyses of 16S rRNA sequences can be applied to specific uncultured prokaryotes and the position of the resultant phylotypes can be interpreted in terms of inferred common ancestry.

In the bulk DNA cloning approach (360, 406), total DNA extracted from environmental samples is partially digested using a restriction enzyme and cloned with a lambda vector. Genomic libraries generated in this way supposedly do not impose any selective bias on the recovery of rRNA genes from members of different taxa. The major practical disadvantage of this approach is that most clones in the DNA library will not contain rRNA genes; the predicted value is 0.5% (406).

A quicker and more effective way of unravelling the composition of prokaryotic communities is based upon PCR-mediated amplification of 16S rRNA genes or gene fragments (using either rRNA or rDNA isolated from environmental samples) with 16S rRNA gene-specific primers followed by segregation of individual gene copies by cloning into Escherichia coli (165). This procedure generates a library of community 16S rRNA genes, the composition of which can be estimated by sampling clones and comparing their sequences by restriction endonuclease digestion, their reaction to specific probes, or by full or partial sequencing (468a). The resultant information can be analyzed to infer abundance and representation in the library. Unique clones can be completely sequenced and their relationship to corresponding sequences from cultured taxa in a taxonomic hierarchy based on 16S rRNA can be determined. As with other molecular approaches, the success of this procedure depends on the quality of the extracted DNA and whether it is representative of natural prokaryotic diversity in the environmental sample.

A number of potential sources of bias exist in DNA-based analyses of natural microbial communities. These have been extensively reviewed elsewhere (184, 205, 352, 464, 468a) and include preferential amplification of specific templates due to PCR primer choice (432), differential cell lysis (147, 327), the GC content of DNA sequences (387), the formation of chimeric PCR products (293, 467), genome size and rRNA gene copy number (123), and the presence of free DNA or DNA in spores (447). It is because of factors such as these that studies based on PCR amplification of small-subunit rDNA genes should be compared with the results derived from the application of contemporary selective isolation and characterization methods. However, it is very encouraging that in comparable analyses of soil-derived 16S rRNA sequences (42, 279, 289, 293, 419) the same groups of prokaryotes were detected despite the use of different DNA extraction, cloning, and PCR techniques.

The analysis of uncultured prokaryotic communities in natural habitats based on 16S rRNA sequences has been extensively reviewed (8, 118, 119, 205, 358, 468a). A number of general conclusions can be drawn from surveys of uncultured prokaryotic communities in marine sediments (107, 149, 184, 254, 382, 392, 452, 459), seawater (34, 166, 381, 495), Yellowstone hot springs (28, 29, 223, 224), rhizosphere (301) and nonrhizosphere (279, 292, 314a, 351, 419) soil, termite guts (367), the rumen (484), and the human gut (430), notably the enormous wealth of microbial diversity, the fact that many of the novel sequences are only distantly related to those known for cultivable species, and the limitations of traditional cultural techniques in retrieving this diversity. It is possible that some of the new phylotypes may be artifacts of the PCR procedure, but most appear to be genuine; for example, Barnes et al. (29) reported that 4 of 98 clones were chimeras, whereas Choi et al. (73) found 7 chimeras out of 81 clones analyzed.

rDNA sequence analyses of uncultured prokaryotic communities are also casting light on the geographical distribution of specific phylotypes. There is evidence that samples taken from the oceans tend to contain sequences of monophyletic groups, for example, archaeal groups I and II and SAR 7 and SAR 11 bacterial clusters (104, 318, 333). Similarly, sequence-based studies from different geographical locations show considerable overlap of sequence types (42, 279, 289, 292, 419). In addition, the perceived ecological boundaries between archaeal habitats (extreme environments such as hot springs and hypersaline waters) and bacterial habitats (temperate soils and waters) are becoming increasingly blurred. Members of the Archaea previously considered to be restricted to high temperatures (division Crenarchaeota) are now known to be abundant in many temperate environments (40, 104, 209), whereas members of the Bacteria appear to play an important role in extreme environments, such as hot springs, commonly considered the province of Archaea (224).

The relative abundance of a sequence in an environmental sample can be estimated by using oligonucleotide probes to analyze total rRNA extracts (104, 165, 382). This approach has some limitations, not least being the fact that different prokaryotes may contain different numbers of ribosomes and hence variable amounts of probe target (468a). A more direct measure of cell abundance can be obtained using fluorescent probes to identify microorganisms in situ (103). This approach can be used to link sequences with morphotypes and to highlight samples that contain cells from which a sequence of particular taxonomic interest originates, thereby providing a tool for use in isolation strategies (222).

Easier and much faster alternatives to the cloning procedures involve the examination of complex microbial populations by either denaturing gradient gel electrophoresis (DGGE) (340) or temperature gradient gel electrophoresis (TGGE) (394) of PCR-amplified genes coding for 16S rRNA. These methods have been used to analyze 16S rRNA genes from environmental samples (129, 134, 340, 341) and allow the separation of PCR-amplified genes on polyacrylamide gels. Separation is based on the decreased electrophoretic mobility of partially melted double-stranded DNA molecules in polyacrylamide gels containing a linear gradient of DNA denaturants (a mixture of urea and formamide) or a linear temperature gradient. Individual bands may be excised, reamplified, and sequenced (134, 339) or challenged with a battery of oligonucleotide probes (340) to give an indication of the composition and diversity of the microbial community.

DGGE and TGGE are relatively easy to perform and allow many samples to be run simultaneously. They are particularly well suited for examining time series and population dynamics. Once the identity of an organism associated with a particular band has been determined, fluctuations of individual components of microbial communities due to seasonal variations or environmental perturbations can be assessed. Heuer et al. (212) used DGGE and TGGE to determine the genetic diversity of actinomycetes in different soils and to monitor shifts in their abundance in the potato rhizosphere. Sequencing of the individual DGGE bands demonstrated the presence of organisms closely related to members of the genera Clostridium, Frankia, and Halomonas. A comprehensive account of the theoretical basis, strengths, and weaknesses of the two methods is given by Muyzer and Smalla (338). The successful application of DGGE has revived interest in genetic fingerprinting of microbial communities. Lee et al. (288) described the use of single-strand conformation polymorphism (357) of PCR-amplified 16S rRNA genes for examining the diversity of natural bacterial communities. Amplified rDNA restriction analysis (ARDRA) has been used to determine the genetic diversity of mixed microbial populations (310, 311) and to monitor community shifts after environmental perturbation, such as copper contamination (413).

Comparison of Molecular and Cultural Techniques
Culture-independent molecular approaches are tending to replace culture-based methods for comparing the composition, diversity, and structure of microbial communities. Investigations based on these approaches have led to the conclusion that traditional methods of culturing natural populations have seriously underestimated archaeal and bacterial diversity. Samples of DNA extracted from seawater, soil, and cyanobacterial mats of hot springs appear to represent predominant populations in these ecosystems, while the species that grow on culture plates are numerically unimportant in intact natural communities. These findings are not surprising, since the vast majority of organisms counted microscopically in samples from these environments have not been grown. One reason for this inadequacy is that cultivation conditions used to isolate organisms do not reflect the natural conditions in the environment examined and thereby select fast-growing prokaryotes that are best adapted to the growth medium (189, 291, 469, 470). However, greater success in bacterial isolation can be achieved by using culture conditions that more closely approximate natural environments (407) or by using novel tools, such as optical tweezers, to physically isolate bacterial propagules (222). There is also molecular evidence that some readily cultivable bacteria are abundant in the environment from which they are isolated (388). These trends suggest that innovative isolation procedures combined with the identification of phylotypes provide a powerful means of addressing the great plate count anomally.

Relatively few studies have involved a twin-track approach whereby both cultivation and direct recovery of bacterial 16S rRNA gene sequences have been used to gain insight into the microbial diversity of natural bacterial communities (114, 207, 430). Comparative studies such as these are needed not least because both plating and 16S rDNA cloning (147) suffer from biases that can distort community composition, richness, and structure. The molecular approaches provide a new perspective on the diversity of prokaryotes in nature but do not yield the organisms themselves. This means that potentially valuable biotechnological traits can, at best, only be inferred from phylogenetic affinities (8, 102, 207). The need to cultivate representatives of phyletic lines of uncultivable prokaryotes for biotechnological purposes poses a major challenge for microbiologists.

A somewhat mixed picture emerges from comparative studies of natural microbial ecosystems. Chandler et al. (70) found close correlation at the genus level between the cultivable portion of aerobic, heterotrophic bacteria and data derived from the 16S rDNA approach when examining deep subsurface sediment. However, these correlations were detected after aerobic treatment of sediment samples at the in situ temperature but not with the untreated sediment core. It is possible that the treatments caused a selective shift towards enrichment of specific bacterial groups in the samples analyzed compared with the original sediment core. Studies of hot spring microbial mats highlighted several close matches between the 16S rDNA of organisms obtained by culture methods and directly recovered 16S rDNA, but only after several liquid dilutions of the inoculum were used for cultivation instead of direct enrichment based on undiluted inoculum (469, 470). Two major conclusions were drawn from these studies. (i) For the most part, direct enrichment techniques select for populations which are more fit under the chosen enrichment conditions and may not be numerically significant, and (ii) the growth of numerically dominant populations may be favored by using an inoculum diluted to extinction, especially in growth medium which reflects the conditions in the habitat under study. The conclusions drawn by Ward and his colleagues are consistent with the results of a comparative analysis in which bacterial isolates and environmental 16S rDNA clones were recovered from the same sediment sample (433). The corresponding data sets showed little overlap, possibly due to direct plating of the undiluted inoculum onto solidified medium with the subsequent isolation of community members that were not numerically significant. In contrast, a close correlation was found between most-probable-number estimates of isolates and environmental 16S rDNA clones taken from the bacterial community of rice paddy soil (207). In a comparative study of the bacterial community diversity of four arid soils, similar relationships were found between 16S rDNA results and cultivation, though significant differences were also observed (114).

The human intestinal tract microbiota presents a somewhat different situation, as extensive past investigations have characterized this ecosystem in more detail than most other natural communities (134, 215, 324). This means that optimal cultural methods are available for comparative studies of the complex microbial communities that reside in the human gut. Wilson and Blitchington (492) analyzed the composition of the microbiota of human fecal samples and concluded that the bacterial species detected by nonselective culture, when anaerobic bacteriological methods were of high quality, gave a good representation of the bacterial types present relative to that revealed by 16S rDNA sequence analyses. The main discrepancy between the two methods was in the detection of gram-positive groups. In a similar study, 95% of rDNA amplicons generated directly from a single human fecal sample were assigned to three major phylogenetic lineages, namely the Bacteroides, Clostridium coccoides, and Clostridium leptum groups (430). However, an in-depth phylogenetic analysis showed that the great majority of the observed rDNA diversity was attributable to unknown dominant microorganisms within the human gut.

It can be concluded that both innovative cultural procedures and culture-independent methods have a role to play in unravelling the full extent of prokaryotic diversity in natural habitats, especially since there are a number of instances where taxa have only been detected using cultural methods (430, 492). Although the two approaches sometimes provide different assessments of relative community diversity, the discrepancies may be attributed to sampling different subsets of the microbial community and to limitations inherent in each of the two approaches. In addition, highlighting consistent relationships between environments based on the dual approach may be highly habitat dependent due to the limited ability of a single cultural method to survey the full extent of the bacterial communities and the influence of bacterial physiology in situ on the success of cultivation in the laboratory.

Genomics is the activity of sequencing genomes and leads to the derivation of theoretical information from the analysis of such sequences with computational tools. In contrast, functional genomics defines the transcriptome and proteome status of a cell, tissue, or organism under a proscribed set of conditions. The term transcriptome describes the transcription (mRNA) profile, whereas proteome describes the translation (protein) complement derived from a genome, including posttranslational modifications of proteins, and provides information on the distribution of proteins within a cell or organism in time, space, and response to the environment. Together, genomics and functional genomics provide a precise molecular blueprint of a cell or organism, and in this and the following section we examine how they can reveal novel targets for search-and-discovery developments.

Introduction. Improvements in sequencing technology have enabled large-scale whole-genome sequencing (136). The general strategy is to fragment the whole chromosomal DNA into large clones, e.g., bacterial, plasmid, and yeast artificial chromosomes, cosmids, λ phage clones, or long-range PCR products (414), followed by a selection strategy from a large, highly redundant library, usually using a mix of random and directed selection (11, 142). For well-studied bacteria, such as Bacillus subtilis and Streptomyces coelicolor, ordered yeast artificial chromosomes (22), ordered overlapping cosmids (385), and physical and genetic maps may enable directed selection. However, for many whole-genome sequencing projects, high-throughput random shotgun sequencing produces new sequence data most efficiently, at least initially, though the accumulation of new data decreases exponentially with the number of clones sequenced (285). Selection strategies such as seeding or parking (275, 411), followed by walking, gap closing, and finishing (180) are used to fill in the gaps. The choice of initial strategies has consequences for the costs involved in these later stages (391), but the costs of selection strategies themselves are also significant. Nevertheless, sequencing at rates of 23 Mb per month in the human genome project (391) indicate the capacity to overwhelm some of these efficiency considerations by brute-force sequencing and computational power. This latter strategy, advocated by Venter (458), has been used in successively larger projects, Haemophilus influenzae (136), Drosophila melanogaster (397), and proposed and implemented for the human genome (187, 458, 474). In the case of bacteria, 22 complete genomes have been published and 87 are in progress (of which 12 were complete as of 11 May 2000) (TIGR Microbial Database, www.tigr.org/tdb/mdb/mdb.html), thereby demonstrating the rapid deployment of sequencing technology. Using a combination of sequencing technology and strategy, whole-genome sequencing can even be a single-laboratory exercise, as in the sequencing of Lactococcus lactis (41), though at a coverage of only two it would barely be considered draft quality in the human genome project. The numbers of prokaryotic whole-genome sequences can be expected to rise rapidly as funding for additional genome sequencing (e.g., http://www.beowulf.ac.uk/) increases.

Searching for drug targets. Clearly the Human Genome Project (115) will have a major impact on the identification of potential drug targets, and these targets will influence the design of specific screens for therapeutic drugs. Potential therapeutic targets such as Alzheimer's disease, angiogenesis, asthma, stroke, and cystic fibrosis, which are human genome specific, multifactorial, and often involve complex signal cascades, may continue to dominate technology development. Specific and sensitive molecular screens are readily derived using the same molecular biology technologies that are driving the genome programs and using the sequence data from those studies to give high-throughput robotic screening. Initial success in the rational design for targets such as HIV-1 protease (243, 461, 482, 496, 497) leads to strategies for rational design involving gene identification (78, 280), metabolic pathway analysis (252), or determination of protein-protein interactions using affinity methods such as the yeast two-hybrid system, phage display (363), or fluorescent-protein biosensors (167), structure prediction (CASP http://PredictionCenter.llnl.gov/) (161, 242, 305, 503, 507), and modelling (63).

Rational design strategies have not been as rapidly successful as predicted, but other current strategies that involve semirational design and high-throughput screening of massive libraries (26) owe much to rational design strategies. Recently, the move has been away from combinatorial chemical libraries to biological libraries, such as those based on peptides and antibodies, again directed by the role of such molecules in human disease processes. Leads identified by direct selection from initial libraries, by high throughput screening or biopanning, are usually not optimal for the selected properties and hence are subject to further rounds of modification or mutation to generate derivative libraries. Even then the rational selection of, for example, peptides which bind at the highest affinity to thrombopoietin receptors, which are readily selectable, may not guarantee the highest biological activity, which is the required property (91, 296). Also, many human diseases of interest to the pharmaceutical industry involve multiple gene pathways, environmental interactions, and genetic predisposition rather than simply direct causal effects (269). These factors also mediate adverse drug reactions and dictate the effectiveness of drug treatments. These considerations are resulting in extensive comparative genome studies of ethnic populations and human disease states (269) and expectations of personal genetic profiles. “By 2035 we will have the ability to sequence the genome of every individual on the planet…” (classified advertisement for SmithKline Beecham published in Nature in 1999).

Whole-genome sequencing provides data for such rational strategies (108, 152, 403) and has become the chosen approach of many large pharmaceutical companies. The annotation of genes and their functional identification provide a list of all potential targets (78). These targets need to be essential for some vital function in the microbial pathogen, conserved across a clinically relevant range of organisms, and significantly different or absent in humans (5). The combination of whole-genome sequences and tools for bioinformatics allow rapid searches for specific genes with these characteristics. Potential targets can be identified even for functions not previously identified in specific pathogens, on the basis of DNA and protein sequence identification of gene function, and the required essential nature of genes or their products can be established through gene knockouts (294) or gene expression studies in host-pathogen interactions (72, 304). With whole-genome sequencing making possible DNA microarrays of (i) whole-genome ORFmers (complete arrays of DNA oligonucleotides representing all the open reading frames [ORFs] identified in the whole genome) (380, 404, 493) or (ii) specific signature oligomers, and their controls, for whole classes of genes (295, 297), the generation of expression data from such studies (98, 135) is likely to be on a scale to compete with and overtake sequencing. Genomics has contributed to this rational search for drug targets by providing a large set of almost complete catalogues of genes, across a wide range of organisms, which can be compared at many levels. Conservation of genes across a wide range of organisms may prove to be a good indication of an essential function (15), and a minimal set of essential genes for life can be identified (337). Transposon mutagenesis and PCR can be used to directly screen for essential genes (3), and signature-tagged mutagenesis can be used to analyze multiple pools of mutants for loss of function (208). Identification of probable targets in silico allows these experimental molecular techniques to be used to search a smaller set of target genes, making them more directed.

These search strategies can be applied to characterized or uncharacterized genes (14), and the chance of identifying a novel target may well be higher for uncharacterized genes. Uncharacterized gene targets may be identified in databases such as COG (274) and PROSITE (214) as those that are conserved across groups such as microbial pathogens. Such targets still need to be identified as nonessential or absent from humans, and since the human genome sequencing is not yet complete, that involves an extensive search through other, surrogate, eukaryotic genomes (e.g., Saccharomyces cerevisiae and Caenorhabditis elegans) and human-expressed sequence tags. The alternative approach is to characterize the target after its identification as a novel target. Undecaprenyl pyrophosphate synthetase (14), for example, was identified first as an unknown potential drug target and then characterized and identified as part of a specifically bacterial pathway.

Characterized gene targets can be sought using strategies to identify taxon-specific genes employing subtractive techniques, most directly between a specific pathogen and the human genome; however, until the complete human genome is available, this is likely to be a complex and incomplete strategy. However, other criteria can be used to define subsets of genes to search using subtractive techniques. In concordance analysis, the sequences present in one set of genomes and absent from others are determined, for example, bacterial genomes compared to eukaryotic genomes (57). Similarly, in differential genome analysis (229), a different algorithm has been used to compare the genomes of pathogens and their free-living relatives in order to identify the genes present only in the pathogen. In a comparison of Haemophilus influenzae with Escherichia coli (229), 40 potential drug targets were identified. Similarly, in a comparison of Helicobacter pylori with E. coli and H. influenzae, 594 genes were found specifically in H. pylori; only 196 of these were of known function, and 123 of these were responsible for known host-pathogen interactions, leaving 73 potential novel targets (228).

The combination of past knowledge of the biochemistry and physiology of microorganisms and new insights into biological function derived from genome and functional genomic studies can guide more specific search strategies. Metabolic databases such as EcoCyc (252) and KEGG (http://www.genome.ad.jp/kegg/kegg2.html) may enable the identification of pathways specific for microbial pathogens; the genes contributing to these pathways can then be used as potential drug targets (251). As well as these taxon-specific pathways, different phylogenetic lineages may contain nonhomologous enzymes catalyzing common reactions (272, 273). Typically differences are found between prokaryotes and eukaryotes, though specific enzymic variants are found in more specific lineages, e.g., the ure locus in mycobacteria (4) and targets in Chlamydia (245, 424). These nonhomologous enzymes provide attractive potential targets, as they can encode essential functions catalyzed by different mechanisms that can be inhibited without the risk of inhibiting analogous functions in humans. Missing genes from known pathways can be indicative of such targets, while the presence of genes of unknown function in gene clusters can help identify these nonhomologous counterparts. Other strategies can direct specific searches in areas of expected drug targets such as virulence genes (315), membrane transporters (500), or homologues of known drug targets in other organisms (111).

Genome studies both confirm the concept of pathogenecity islands (193) and reveal the rapid divergence of these genes in the evolution of pathogens (369), making them attractive but difficult targets. Similarly, an essential function of pathogens is evasion of the host response defense mechanisms: pathogens such as Haemophilus influenzae, Helicobacter pylori, Escherichia coli, and Plasmodium falciparum (99, 465) all show extreme variation in the targets of the immune system. The presence of simple repeats in prokaryotic DNA sequences has been associated (217, 218) with the concept of contingency genes linked to phase variation of gene expression in pathogens (328). Strategies which combine search algorithms for detecting such repeats with the ability to display genome annotation, and specifically locating them relative to ORFs of known function, can identify targets that are critical to virulence (403).

Plasmodium falciparum is an example of a major human pathogen for which new insights and strategies for drug development are emerging. The full genome sequence of 30 Mb in 14 chromosomes, of P. falciparum (http://www.sanger.ac.uk/Projects/P_falciparum/who&what.shtml) is being completed (48, 155). Searching DNA sequence databases for targets homologous to known drug targets in other organisms has revealed an aspartic protease (93), cyclophilin (38), and calcineurin (111), explaining the antimalarial activity of cyclosporin A. The full genome can be expected to provide many more potential targets (479).

Treponema pallidum, the causative agent of syphilis, is difficult to culture, and little is known of the molecular biology of its virulence mechanisms. Its complete genome has been sequenced (143) and analyzed for virulence factors, revealing several classes of predicted protein-coding sequences that are potential virulence factors (478). Whole-genome studies are resulting in significant progress in understanding these and other infectious agents.

Natural products. Nevertheless, it is unlikely that some of the most successful drugs could have been discovered by any process of rational or semirational design. The mode of action of the immunosuppressants cyclosporin A, FK506, and rapamycin, which bind to cis-trans prolyl isomerase and FKBP12 but then inhibit further steps in critical signal transduction cascades (69, 206), e.g., through calcineurin in the case of cyclosporin A and FK506, would be too complex to design. Not only is the mode of action indirect, but these molecules are complex. The drug targets may have been identified by comparative genomics, since they are conserved from unicellular eukaryotes to humans, but the drugs themselves have required the massive library generation and screening activity of natural selection to evolve. Similarly, two of the most successful antimalarial drugs, quinine and chloroquinine, exert their effect by inhibiting host-encoded functions (389) rather than activities encoded by P. falciparum itself. Chloroquine resistance in P. falciparum resides in a 36-kDa nucleotide sequence which contains genes which are all of unknown function (429), along with 40% of the P. falciparum genome (155).

However, in the search for new classes of antibiotics over the last 20 years, traditional approaches have also failed to deliver new drugs fast enough to keep up with the loss of effectiveness of existing drugs against increasingly resistant pathogens (95% of Staphylococcus aureus are penicillin resistant and 60% are methicillin resistant, and there are cases in China, Japan, Europe, and the United States of vancomycin resistance [http://www.promedmail.org]). The development of resistance may be followed by compensatory mechanisms to adjust for reduced fitness, which may then lock in the resistance mechanism (96). Although there are 150 antibiotics approved in the United States and 27 in clinical development (http://www.phrma.org/), only 1 antibiotic was approved in 1993, none in 1994, and only a few since (51, 428). Thus, random-screening search strategies are being abandoned in favor of rational, target-based approaches.

Molecular biology, robotics, miniaturization, massively parallel preparation and detection systems, and automatic data analysis dominate the search for drug discovery leads. Natural-product extracts and bacterial culture collections are not easy partners in this drug discovery paradigm. The separation, identification, characterization, scale-up, and purification of natural products for large-scale libraries suitable for these high-throughput screens are daunting, and rational arguments for the selection of organisms and/or natural-product molecules are often absent, especially given the poor taxonomic characterization of strains in natural-product bacterial strain collections (A. C. Horan, M. Beyazova, T. Hosted, B. Brodsky, and M. G. Waddington, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:89, 1999).

Many of these screening systems are not sufficiently robust to handle complex mixtures of natural products from ill-defined biological systems (Horan et al., ibid.) and may be inhibited by interactions with uncontrolled physicochemical conditions, simple toxic chemicals, and known bioactive compounds. This has led to significant efforts in rational drug design, combinatorial chemistry, peptide libraries, antibody libraries, and combinatorial biosynthesis (27, 89) and other synthetic and semisynthetic methods to provide clean inputs to screens. However, natural products are still unsurpassed in their ability to provide novelty and complexity. In chemical screening of natural products (216), complex mixtures of metabolites from growth and fermentation are separated, purified, and identified using high-pressure liquid chromatography, diode array UV/visible spectra, and mass spectrometry. Novel chemical structures are passed on for screening, now uncontaminated with background interference from the original complex mixture, and built up into high-quality, characterized natural-product libraries. This strategy suffers from poorly characterized culture collections, which make the choice of organisms to screen difficult, and the inability to control the expression of metabolic potential. These issues are specific examples of the requirement for better systematics, physiology, conservation of microbial diversity, and data integration. For example, typical commercial collections of actinomycetes might consist of 20,000 to 40,000 organisms classified at genus level on the basis of morphology and simple phenotypic characters. This identification may guide the choice of media and conditions for growth but will not aid the selection of strains, predict metabolites, or optimize expression for drug discovery. These issues can be tackled using the same tools and technologies that are driving the search for new drug targets.

Searching for new drugs. The advent of the complete Streptomyces coelicolor genome (http://www.sanger.ac.uk/Projects/S_coelicolor/) provides the opportunity to explore the evolutionary and functional relationships of one of the best studied and industrially and medically significant groups of organisms, the genus Streptomyces. This advance will provide new information to aid search and discovery of novel organisms and new bioactive natural products (R. Brown, H. C. Choke, S. B. Kim, A. C. Ward, and M. Goodfellow, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:149, 1999), identify roles in ecosystems (493), and lead to improvements in bioprocess control (20, 231, 248) for existing products. The extent to which the information from the S. coelicolor genome can be utilized across such a broad spectrum depends upon how representative it is of other streptomycetes.

The streptomycetes form a distinct clade within the radiation encompassed by the high-GC gram-positive bacteria in the 16S rDNA tree. This taxonomic group is identified as a major source of bioactive natural products (60). As a result, major collections of poorly characterized actinomycete strains are held by most large pharmaceutical companies. However, the relationship between metabolic potential and taxonomic or phylogenetic relationships is poorly understood. Within the streptomycete clade there are well-characterized groups at all levels of taxonomic variation from suprageneric (Streptomyces should probably be more than one genus [S. B. Kim, C. N. Seong, and M. Goodfellow, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:55, 1999]) to infraspecific. At the genus and species levels, fundamental questions arise about biodiversity in the prokaryotic world (314, 362, 468, 499). At the molecular level, this diversity is poorly represented. Estimates suggest that less than a tiny fraction of prokaryotes have been isolated, and representatives of only about 10 to 15% of described species are held in service culture collections. Selective isolation of streptomycetes from the rhizosphere of a common tropical tree, Paraserianthes falcataria, revealed extensive diversity around the Streptomyces violaceusniger clade (L. Sembiring, M. Goodfellow, and A. C. Ward, Abstr. 11th Int. Symp. Biol. Actinomycetes 11:69, 1999).

Full 16S rDNA sequences are available in the ribosomal database (http://www.cme.msu.edu/RDP/html/) for less than 100 of the 513 validly described streptomycete species. There is evidence that specific metabolites, such as clavulanic acid, may be synthesized by strains in a specific clade (unpublished data) and that the ability to synthesize, for example, streptomycin and related metabolites appears to be randomly distributed across the whole genus. However, these conclusions are tentative given the poor taxonomy, random screening, and limited chemical characterization of metabolites. Compounding this uncertainty is the complexity of regulatory controls; genetic (71) and genomic methods have the potential to unravel some of this complexity.

Currently genomics has little to say at these levels (species and subspecies strains)—most whole-genome sequencing studies have taken representatives of the major groups (TIGR website cited above) or compared very closely related strains, such as Helicobacter pylori (6). The specific/infraspecific relationships in the streptomycetes and the way they are reflected in the biosynthetic potential to produce novel, bioactive compounds could significantly influence strategies for search and discovery, screening, and bioprocess development. To extend whole-genome studies to more streptomycetes would reveal these relationships in a comprehensive way which would enable validation of current methodologies (from 16S rDNA phylogenies to DNA-DNA pairing) and lead to new understanding of speciation, phylogenetic relationships, and genome function in secondary metabolism. However, whole-genome sequencing of more streptomycetes is an open question that would involve difficult choices; any small number of strains would only begin to address the questions above. However, it must be possible to begin to address these problems using the S. coelicolor genome as a template for whole-genome comparisons across the streptomycete clade.

The functional analysis of the S. coelicolor genome would gain significant benefit from a greater understanding of the ecological niche and role of Streptomyces violaceoruber-like streptomycetes [S. coelicolor A3(2) is formally a synonym of S. violaceoruber]. There is considerable current interest in the role of actinomycetes and streptomycetes in particular in natural ecosystems, especially grassland. Their role in carbon turnover and their response to land management practices could be important in maintaining soil fertility and productivity during a shift to sustainable land management. However, the ecological role of S. coelicolor (S. violaceoruber) is poorly understood, and in identifying the function of unknown genes, knowledge of its ecological role would enable answers (B. D. Kell, personal communication, 1999) to be attempted. Clearly the whole genome has a significant role in identifying the metabolic potential for activity and interactions in the soil ecosystem (493). The identification of strains related to S. coelicolor A3(2) and their detection using molecular ecological methods and selective isolation would complement functional analysis.

Although the discrepancy between organisms isolated and those identified by molecular methods is often striking, careful studies identify biases in both approaches and, with appropriate techniques, the ability to culture many organisms from specific habitats (170, 470). The importance of cultivation conditions has been emphasized, and the use of techniques such as extinction culture for abundant oligotrophic fractions of the microbial community point the way forward (65) without the need for the concept of uncultivatable bacteria. Nevertheless the description of a specific bacterial cytokine required for resuscitation of Micrococcus luteus (329) illustrates the case in which neither medium development nor extinction dilution would be expected to resuscitate dormant M. luteus (nevertheless, M. luteus is not difficult to isolate). The discovery of M. luteus resuscitation protein factor (rpf) was the result of careful microbiology (463). Its rapid identification across the whole of the actinomycete clade, including mycobacteria and streptomycetes, with implications for clinical and ecological isolation (250), was the result of genomic studies, and the identification of multiple rpf genes in Mycobacterium tuberculosis and Streptomyces coelicolor was the direct result of the availability of whole-genome sequences. The whole area of stress response, signaling, and global regulatory mechanisms is now being dissected in organisms like S. coelicolor (309, 344, 356, 486) and has important implications for growth and antibiotic expression, affecting isolation and screening strategies for natural products.

The ecology of streptomycetes is of considerable interest for search and discovery of natural products. Currently novel products are sought from organisms isolated from extreme or novel environments (79, 106). However, the extent of variation within the compass of the known variation of streptomycetes is diverse and complex (Sembiring et al., abstr.), understanding it is tied up with problems of isolation and cultivation of the full diversity, speciation, and expression of the full metabolic potential. Understanding the extent of genes of known function in streptomycete genomes, identifying the role of genes of unknown function, and understanding regulatory and stress response networks will enable rational design of isolation methods and screening strategies.

Bioprocess control. Controlling gene expression is essential in exploiting new drugs, in research and development, and in production in fermentation processes. In bioprocess control, whole ORFmer arrays of Streptomyces coelicolor could be used to monitor gene expression and physiological responses of streptomycetes like S. fradiae and S. clavuligerus in large-scale fermentations. Sequence similarities across the streptomycete clade may mean that virtual-expression arrays (156) may be used to monitor gene expression for research and development and for optimization and control of antibiotic production. Many switches in metabolism are reflected by networks of signaling and response genes so that incomplete and qualitative coverage of the gene response of these industrially significant organisms would still enable their identification and interpretation from knowledge of the S. coelicolor genome. These data will enable software sensing (323) of important physiological shifts in bioprocess operation, identified by transcriptome analysis of representative fermentations, by estimating them from secondary measurements using current on-line and off-line process measurements. However current measurements (substrate feeds, physicochemical measurements like pH, dissolved O2, substrate concentrations, carbon dioxide evolution rate, and oxygen uptake rate) would need to be supplemented with multivariate measurements which are sensitive to biological state, such as FT-IR (437), dielectric spectroscopy (501), and PyMS (317). The problem with many of the multivariate methods is that, for the complex samples from fermentation processes, they are black box techniques; by combining them with transcriptome analysis, patterns detected by these techniques could be interpreted using the power of genomics. One application of comparative genome analysis would be to identify specific DNA sequences which could be assembled into either specific (for individual strains) or generic arrays to monitor gene expression in streptomycete bioprocesses.

Whole-genome sequencing and rapid biotechnological developments in the field of molecular biology mean that the gene is seen as the drug lead and rational design as the route to drug development. However, natural products are the result of a massively parallel experiment in combinatorial gene shuffling, mutagenesis, and screening for the generation of bioactive metabolites. And genomics and new technology (160) can promote the search for new natural products by increasing understanding of biodiversity and the factors that regulate microbial growth and expression, complementing the synthetic and semisynthetic routes to drug development.

Proteome analysis comprises three sequential steps: sample preparation, protein separation and mapping, and protein characterization. Sample preparation may entail cell fractionation and preliminary removal of more abundant proteins in order to detect those present in low concentration. Analysis is very dependent upon effective protein separation, and two-dimensional (2-D) gel electrophoresis (most usually immobilized pH gradient followed by molecular weight separation) is the present method of choice. Between 2,500 and 10,000 proteins are claimed to be resolvable on such gels (204, 244), while, in addition to determining protein inventories, the analyses can be made quantitative with respect to individual proteins (detection is possible at 1 ng with silver staining and at less than 1 pg with fluorescent dyes). It is important to note that posttranslational modifications (PTM) will significantly increase the number of separate proteins expressed from a genome and will not be revealed by genome annotation; the estimates are 1.2- to 1.3-fold for bacteria and 3-fold for eukaryotic microorganisms like S. cerevisiae (368). Protein characterization is achieved by mass spectrometric amino acid sequencing and identity of PTMs, followed by interrogation of protein databases. In turn, this reverse genetics enables the identity of genes that are responsible for producing a particular protein expression profile (see below).

The usual approach to proteome analysis is first to produce a 2-D map of all the proteins expressed under so-called normal conditions in order to define the constitutive proteome of a cell or organism. Thereafter, qualitative and/or quantitative changes in the proteome can be charted as responses to different conditions (or reflections of different physiological states) induced by stress, growth environment, pathological state, and so on. Thus, reference maps and databases of identified and unidentified proteins are established.

In this new, fast-moving field, acceptance of an agreed terminology is crucial, and the recent proposals made by VanBogelen et al. (453) are very helpful in this respect. Protein expression profile is the quantitative catalogue of proteins synthesized by a cell or organism under defined circumstances; protein phenotype defines the character or state of a specific protein under defined conditions (e.g., quantity, rates of synthesis and turnover, and extent of PTM); a regulon is a set of proteins whose synthesis is regulated by the same regulatory protein; a stimulon is a set of proteins whose synthesis responds to a single stimulus; and the protein signature is a subset of proteins whose altered expression is characteristic of a response to a defined condition or genetic change—they may relate to specific metabolic pathways or cell functions. The last cannot be distinguished simply by comparing two protein expression profiles; rather, signatures are recognized only after reviewing numerous profiles obtained under similar or different conditions. Various signatures have been identified that are associated with microbial growth rate, ribosome function, and protein secretion (453). These authors conclude that phenotypes and signatures will develop as tools for addressing the functions of unknown proteins and for evaluating the mode of action of physical and chemical agents. Put another way, proteomics provides a very powerful means of revealing epigenetic effects, i.e., effects that involve multiple genes.

At present proteomics is being applied most actively in pharmaceutical research and development (16, 90) in two principal areas: drug discovery and target selection (e.g., via proteome difference analysis of pathogenic versus nonpathogenic organisms, normal versus dysfunctional states, and disruption of stress-induced protein synthesis) and drug mode of action, toxicological screening, and the monitoring of disease progression during clinical trials. The latter group of clinical features, which are directed at gaining a fuller understanding of pharmacological mechanisms of drugs, is driving the new field of pharmacoproteomics. On the one hand natural-product discovery and combinatorial synthesis can generate an enormous repertoire of candidate drugs; on the other hand the demonstration of their mode of action, efficacy, and safety is hugely demanding in resources and time. The advent of pharmacoproteomics is set to transform these aspects of pharmaceutical development.

Although to date proteomics has attracted the greatest interest from the pharmaceutical industry, its potential for application in other areas of biotechnology is being recognized. Moreover, the application of proteomics is not restricted to well-characterized—in terms of genome sequencing—groups of microorganisms. Exploration of the biochemistry and physiology of extremophilic and extremotolerant organisms by proteome analysis, for example, could reveal much that has relevance for biotechnology exploitation. Already proteome expression profiling has begun for some hyperthermophiles (164, 259), and other studies such as these open the way for discovering stable enzymes and other proteins. For example, the unusual group of tungstoenzymes are found largely, though not uniquely, in thermo- and hyperthermophilic microorganisms, and it has been suggested (267) that they have evolved to catalyze very low redox reactions at extreme temperatures, and these same organisms contain an unusually high abundance of chaperonins (241). Equally exciting opportunities may present from the discovery of proteome signatures in extremophiles as a means of detecting novel metabolism. In this context our interest is in the growth of marine bacteria under deep-sea conditions (high pressure, low temperature, medium to high salinities, and oligotrophic nutrient status) and applying proteomics to detect novel epigenetic phenomena of potential exploitability. One final illustration of the power of proteomics is in the area of decontamination and sanitization within the food, pharmaceutical, and other hygiene-sensitive bioindustries. Proteome analysis of stress responses is important here because it reveals global regulation of gene expression under different stress conditions. Thus, a recent analysis of the psychrotolerant food spoilage organism Pseudomonas fragi revealed overexpression of 91 stress proteins in response to challenge from cleaning-disinfection treatments in food plants (457). Such information is highly germane to the development of effective treatment procedures where organisms are known to counteract simultaneous adverse conditions by coordinated changes in gene expression.

The development and application of proteomics constitute a very recent field of technology. Present limitations and areas in need of improvement include the resolution and characterization of hydrophobic proteins which include major targets for pharmaceutical intervention (membrane enzymes and receptors) (90); quality of protein separation (368); ability to detect very low copy number proteins (226, 451); and improved throughput and automation (90, 226).

Biogeography is the branch of biology that deals with the geographic distribution of organisms and has developed almost exclusively with reference to animal and plant ecology. We speak of endemic species as those that are restricted to a particular geographic region and “hot spots” that are characterized by their high proportion of endemic species (342, 375). In contrast, species that have a worldwide distribution are termed cosmopolitan. Is biogeography of relevance in the microbial world? In their seminal article on the biogeography of sea ice bacteria, Staley and Gosink (422) proffer three reasons why microbial geography is a critical topic for enquiry. Knowledge of biogeography will assist in (i) determining the extent of microbial diversity, (ii) identifying threatened microbial taxa, and (iii) identifying the ecological function of a particular species. We will add two other reasons, those of assisting search and discovery (knowing where to look) and helping to resolve the dilemma of how to conserve microbial gene pools (see later). However, the first question to address is whether biogeography applies to microorganisms.

Microbial ecologists have tended to accept somewhat uncritically the pronouncements of Beijerinck and Baas-Becking (see reference 422 for references) that bacteria (and by extension all microorganisms) are cosmopolitan: in Beijerinck's terms, “everything is everywhere,” to which Baas-Becking added “the environment selects.” A number of microbiologists are challenging this assertion of cosmopolitan geographic distribution. Tiedje (441) has questioned what genotypic level corresponds to everything—is it the species, as in the case of animals and plants, or the variety, or the DNA sequence? And what geographic scale corresponds to everywhere—a sand grain, soil aggregate, square meter, or catena? Questions such as these can now be addressed very critically using the range of molecular biology and high-resolution chemometric approaches that are available. We would argue that microbial biogeographic studies should be focused on the infraspecies genotypic level because of the intimate relationship between environmental/geographic factors and the speciation of microorganisms. Consequently we will adopt the term geovar (422) for a geographic variety of a microorganism that is endemic to a specific area or host. Moreover, definition at the varietal or infraspecific level is crucial in the context of biotechnology discovery because many sought-after properties are known to be strain as opposed to species determined.

In the remainder of this section, we examine the case supporting microbial endemism, while acknowledging that the cosmopolitan hypothesis has its strong adherents. In our opinion, application of rigorous analytical methods has paramount importance in coming to decisions on this issue. For example, solely on the basis of microscopic recording of cryptic ciliates in a freshwater lake and a shallow marine sediment, it was concluded that a substantial fraction of all known free-living ciliates were represented in these two habitat types and that such ciliates had a cosmopolitan distribution (130). From these observations, it was extrapolated that “in the case of microorganisms ‘everything is everywhere’.” Such a statement may be valid for the particular taxon studied and the limited regional/environmental range examined, but the restricted analytical approach (see below for discussion) presents difficulties for interpretation while we opine that the extrapolation to microorganisms in general is quite unjustified. Other protozoologists consider that many soil ciliate species show restricted geographic or ecological ranges (138), albeit the percentage of endemics is low. Data on infraspecific variation within ciliates are very sparse but presently indicate limited genetic diversity (44). In contrast, a study of the diversity of Vibrio anguillarum isolates employed a large battery of different typing methods, including ribotyping, serotyping, lipopolysaccharide profiling, plasmid typing, and biotyping (API, BIOLOG, and BioSys) (278). This study revealed a high genetic diversity within the species that correlated with geographic distribution and host species. The authors remarked that such relationships could be obtained only by analyzing a large number of isolates and deploying a multityping approach. Similar geographic distinction is known to occur within phytopathogenic organisms, one of the best documented being that of Ralstonia solanacearum infection of crops such as potato and banana. The most recent assessment of the genetic diversity within this bacterium has been made using PCR-restriction fragment length polymorphism analysis of the hrp (hypersensitive reaction and pathogenicity) gene region (374). The analysis confirmed separation of the species into two major groups, the Americanum and Asiaticum divisions, and revealed finer geographic distinctiveness, e.g., southern African (VII) and northern African (I and II) clusters and Reunion Island cluster (VIb).

An interesting case of restricted geographic range has been reported for bacteria capable of degrading the xenobiotic chemical 3-chlorobenzoate (3CBA) (150). 3-CBA degraders were isolated from soils in Australia, California, Canada, Chile, South Africa, and Russia by gross enrichment culture. Isolates were characterized on the basis of repetitive extragenic palindromic PCR genome fingerprinting and by ARDRA. All of the genotypes were referable to the Alcaligenes-Burkholderia group of β-Proteobacteria, and 91% of the genotypes were found to be unique to the geographic location from which they were isolated, and 98% of the ARDRA types were found only at one location. These data strongly indicate that 3CBA genotypes are endemic to the geographic regions examined. At a finer geographic scale, endemism has been claimed within natural communities of Achromatium oxaliferum (185); sediments from three freshwater sites in northern England contain genetically distinct populations of A. oxaliferum based on sequence analysis of PCR-amplified 16S rRNA genes; identical sequences were not recovered from the different sites. The sequence evidence for distinct populations has been corroborated by differences in nutritional and energy conservation characteristics (186).

Extremophiles might be expected to be salient organisms with which to test the endemic versus cosmopolitan hypothesis. Kristjansson et al. (276) have commented that it is not known to what extent geographically distinct extreme sites may differ and to what extent such sites harbor endemic and cosmopolitan taxa. These authors reiterate, however, that without robust and refined taxonomic databases, this question will not be resolved. Supporting evidence for the cosmopolitan distribution of thermophiles and hyperthermophiles has come from work on bacteria (e.g., Thermobrachium celere [119]), cyanobacteria (e.g., Microcoleus chthonoplastes [154]), and archaea. Stetter and his colleagues (425) used DNA-DNA pairing to show that Alaskan and European hyperthermophilic archaea were cosmopolitan, and such evidence is far more convincing than that produced from partial 16S rDNA sequences (154). However, the evidence for endemism among this group of extremophiles is particularly strong. There are examples of unique isolations of prokaryotes (e.g., Methanothermus sociabilis from Iceland [287]), and others where they are geographically restricted (e.g., Thermus aquaticus/USA and T. filiformis/New Zealand [402] and thermophilic fermentative anaerobes/New Zealand [378]). A final example, also from Stetter's laboratory (409), again demonstrates the value, and desirability, of using DNA-DNA pairing analysis in this type of research. The archaeon Thermoplasma volcanicum can be differentiated into three geographically distinct DNA groups restricted to Vulcano Island, Italy, to Indonesia, and Iceland together with Yellowstone.

The foregoing evidence regarding microbial biogeography is in large measure anecdotal. It is imperative, therefore, that a framework be established for determining whether or not an organism is endemic or cosmopolitan. The Staley-Gosink postulates (422) offer a major stimulus for conducting further research in this field. Fulfillment of the following postulates would be necessary to categorize an organism as cosmopolitan: (i) at least four strains of the organism should be isolated from different samples of the ecosystem under consideration; (ii) the strains must be demonstrably indigenous to the ecosystem or host; (iii) at least four strains of a putatively identical organism must be recovered from one or more geographic locations from which the first strains were obtained; and (iv) the two or more groups of strains from such separate geographic locations must be subjected to phylogenetic analyses by sequencing two or more appropriate genes. If the strains show no evidence of forming clades, they can be considered cosmopolitan; otherwise they can be designated endemics or geovars. Staley and Gosink (422) also proposed a fifth but optional postulate in order to establish species identity of putative geovars. Polyphasic taxonomic analysis, in which DNA-DNA pairing is de rigueur, must be employed in such a test: thus, if two or more groups of strains show geographic clustering and fulfill the criteria for being different species, they should be named and described as separate, endemic species.

Research on the sea ice microbial community is yielding further strong evidence for microbial endemism and has been the subject of an excellent review by Jim Staley and John Gosink (422). Here we highlight a few features of this work that are especially germane to our overall critique of microbial biogeography. Sea ice covers at least 7% of the earth's surface, provides a range of microenvironments, and sustains a diverse microbial community. Among the sea ice bacteria, for example, are some of the most psychrophilic organisms so far described. The attention of research groups in Australia and the United States on sea ice communities in recent years has led to many new bacterial genera and species descriptions: Polaromonas (233), Gelidobacter and Psychroserpens (46), Octadecobacter (181), Colwellia spp. (45), Polaribacter (182), Psychroflexus (47), and “Iceobacter” (422). Strains of Octadecobacter, Polaribacter, and “Iceobacter” were isolated from both Arctic and Antarctic sea ice, and species identities for Octadecobacter and Polaribacter have been verified by DNA-DNA pairing. The data indicate that none of the species had a bipolar distribution. The strains of “Iceobacter” have not been circumscribed by DNA-DNA pairing, but on the basis of major phenotypic differences, distinct north and south polar species that again lack a bipolar distribution have been proposed. Nevertheless, the authors prudently advise that “Not finding cosmopolitan (sea ice) species does not mean that they do not exist.” In this context, it will be interesting to test the recently described Antarcticobacter heliothermus gen. nov., sp. nov. (282) for bipolar distribution. It remains but to emphasize that 16S rDNA sequences are too highly conserved to permit rigorous detection of endemic microbial taxa and that other phylogenetic markers and high-resolution discriminatory procedures need to be applied to such questions.

The deep sea: a suitable case for study

Why the Deep Sea?
The oceans constitute more than 70% of the earth's surface, of which about 60% is covered by water more than 2,000 m deep. Paradoxically, the oceans represent the earth's last environment to be explored for its microbiology. The abyssal and hadal oceans (depths below 2,000 and 6,000 m, respectively [56]) were regarded as biological deserts—Forbes' azoic zone theory (151). The analogy now, however, is of the deep seas as rainforests, not least in terms of their microbial diversity. In a landmark paper, Grassle and Maciolek (183) attempted to estimate the macrofaunal species richness of the deep sea by extrapolations from a large data set obtained from the continental slope and rise of the eastern United States. They concluded that, conservatively, the diversity could exceed 10 million species and observed that about 60% of the species they recovered had not been described previously. Although the bases for this estimate have been criticized, the implicit message coming out of the study, as May has emphasized, “is good reason for more taxonomists to turn their attention to the oceans” (313). This position is further reinforced if we take account of the high degree of endemism recorded in the deep sea (50 to 90% for trench fauna [188]). Thus, the marine environment, and the deep seas in particular, should commend itself to microbiologists and biotechnologists alike as a source of novel organisms and exploitable properties.

In a recent article, Deming (106) makes a compelling case for deep-sea biotechnology: the deep sea encompasses the extremes of most environmental conditions found on earth, and the links between these and the implications for biotechnology search and discovery are summarized in Table 3. Consequently the question we address in this section is the extent to which the paradigm shift in exploitable biology is affecting the bioprospecting of the deep seas.

Diversity and Adaptation
A totally unexpected degree of diversity has been uncovered in marine microorganisms representing all domains and viruses and recovered from all depths down to the Challenger Deep (10,897 m). Most of these discoveries have been made in the last decade, many as a result of applying molecular surveying techniques. While it is not our intention to make a comprehensive review of the now large and rapidly growing literature on this topic, we highlight a few points that have most relevance for biotechnology. Useful starting points for detailed discussions are the recent editions of Cooksey (84) and Horikoshi and Tsujii (221) and the articles by Yayanos (505), DeLong (101), and Fuhrman (148).

The choice of detection and cultivation methods is especially critical when studying marine microorganisms. The introduction of ss rDNA sequencing has had a dramatic effect on detecting marine microbial diversity, but a certain amount of circumspection is proper if reliance is placed only on this and other molecular techniques. First, congruence between molecular and cultivation detection methods can be poor. Our work on deep-sea actinomycetes illustrates this inconsistency: in one series of experiments, the number of culturable actinomycetes from bathyal, abyssal, and hadal sites in the northern Pacific Ocean close to Japan ranged from 1.6 × 104 to 3.4 × 102 CFU per g of wet sediment (80). The 16S rDNA clone libraries obtained from the same sites, and in some cases the same sediment samples (290, 452), failed to reveal the presence of markers of the actinomycete taxa which we isolated or indeed any actinomycete signatures. We opine that while the view that the limitations of culture techniques mean that sequence-based techniques may provide a less biased picture of microbial community composition is generally accepted, it may not always be valid and warrants rigorous testing. Second, biotechnology in a great many and probably most instances has a requirement for real, not virtual, organisms, and thus research on ways and means of bringing as yet uncultured organisms into culture should be given much greater prominence. The taxon-selective medium approach for isolating actinomycetes (see above) has been used to advantage in our laboratories to recover marine strains.

Classical gross enrichment methods for isolation will not be universally appropriate for marine microorganisms. For example, the highly oligotrophic nature of many marine habitats indicates that chemostat and dilution to extinction culture procedures be used, especially for isolating picoplanktonic organisms. The dilution culture technique (see above) has been deployed successfully to isolate marine ultramicrobacteria. Seawater samples (up to 106-fold dilutions) inoculated into filtered-autoclaved seawater produced growth after prolonged incubation; 15 of 37 bacterial strains recovered could only be cultured on low-nutrient media and represented obligate oligotrophs (407). Among the facultative ultramicrobacteria that were isolated was Sphingomonas sp. strain RB2256 (408). Cells of this organism are not miniaturized on starvation but have a consistently small volume that is independent of growth conditions. The DNA content is very low (1.5 Mbp; 1 to 1.5 fg cell−1, equivalent to 25% of the Escherichia coli genome), and the bacterium contains only one copy of the rRNA operon. The only other known extinction culture marine isolate, Cycloclasticus oligotrophus (64), isolated from Resurrection Bay, Alaska, has a larger DNA content than Sphingomonas sp. RB2256 (61% of the E. coli genome) and again a single rRNA operon. This organism utilizes a few aromatic hydrocarbon substrates, and its specific affinity for toluene is the greatest yet reported for any organism-substrate combination. It is noteworthy that other strains of Cycloclasticus appear to be important polycyclic aromatic hydrocarbon degraders in the marine environment (159). Similarly, for the isolation of strictly barophilic microorganisms, it might be useful to compare the results of samples collected and manipulated without decompression. This point is well illustrated by Yanagibayashi et al. (504), who showed that decompression of Japan Trench (6,292 m) sediment samples resulted in a shift in the dominant bacterial communities from barophilic Shewanella and Moritella strains at 65 MPa to Pseudomonas strains at atmospheric pressure. High-pressure chemostat systems have been developed at the Woods Hole Oceanographic Institute (236) and the Japan Marine Science & Technology Center (Yasuhiko Komatsu, personal communication). The Woods Hole group (494) have reported recently on copiotrophic, barophilic bacteria that can adapt to and grow at a wide range of substrate concentrations, including oligotrophic concentrations; thus, numerically important oligotrophic bacteria may be difficult to isolate unless techniques such as extinction culture are employed.

The mechanisms by which marine bacteria adapt to high pressures are very inadequately understood, but pressure-regulated gene expression and its relationship to barophily and barotolerance is gradually being determined. Pressure-regulated genes are believed to aid pressure acclimatization in marine bacteria that are exposed to large vertical changes in the water column, but they are also found in bacteria that are not subject to pressure changes as a result of overlapping effects of pressure and other environmental stresses (32). To date most work on deep-sea barophilic bacteria has concerned taxa within the γ-Proteobacteria—Colwellia, Moritella, Photobacterium, and Shewanella and an unidentified genus (100, 101). Among these bacteria are some that are extremely barophilic, such as the newly described species Moritella yayanosii isolated from the Challenger Deep of the Mariana Trench, which grows at 60 to 100 MPa and has an optimum of 70 MPa (350). Encouraging progress has been made on molecular mechanisms by Bartlett and his colleagues at the Scripps Institution of Oceanography, working principally with deep-sea Photobacterium, and Horikoshi and Kato at the Japan Marine Science and Technology Center (JAMSTEC), whose main focus has been on deep-sea Shewanella strains. Reverse-pressure regulation of outer membrane proteins has been shown in the moderate barophile Photobacterium profundum SS9. A 10- to 100-fold increase in the expression of the OmpH protein occurs at high pressures (28 MPa), while at 0.1 MPa the OmpL protein is produced in greatest quantity; a third pressure-regulated protein, OmpI, is expressed at 40 MPa. The OmpH protein is believed to be a relatively nonspecific porin (477) that may facilitate nutrient uptake under increasingly oligotrophic conditions of the deep sea. More recently it has been demonstrated that RecD function is required for high-pressure growth and maintenance of plasmid stability in P. profundum SS9 (39). The JAMSTEC group have distinguished a “barophilic branch” of Shewanella benthica strains that is distinct from moderately barophilic and barotolerant strains. In the barophilic S. benthica DB6705 a pressure-regulated operon consisting of two small, unidentified ORFs (ORF1 and ORF2) is under the control of a promoter (256) that has been cloned into Escherichia coli (255) and shown to have a sequence similar to that of the ompH promoter of P. profundum A second pressure-regulated operon (ORF3 and ORF4) is located downstream from the first operon; ORF3 encodes the CydD protein (258), which is required for the assembly of the cytochrome bd complex. A truncated respiratory chain has been proposed for Shewanella benthica at high pressure in which quinol oxidase acts as the terminal oxidase (253). The relationship between such pressure-regulated bioenergetics and barophily remains to be elucidated.

Apart from these studies on bacteria, there is evidence for barophily (or barotolerance) in deep-sea protozoa. Turley et al. (448) isolated a Bodo sp. from a North Atlantic sediment (4,500 m) that grew exponentially at the in situ pressure (45 MPa) but produced no growth at atmospheric pressure. This flagellate was tolerant of decompression during sediment collection and subsampling in the laboratory but required high pressure for its growth. More recently it has been proposed that shallow-water flagellates may have adapted to the high pressures of hydrothermal vents and the deep sea; several kinetoplastid and chrysomonad species have been grown at equivalent in situ pressures up to 30 MPa (18), while a deep-sea choanoflagellate isolate encysted at pressures greater than 5 MPa.

The recent discoveries of additional deep-sea environments (sub-sea floor sediments, cold fluid seeps, brine lakes, carbonate mounds, mud volcanoes, hydrocarbon seeps, and gas hydrates) open up entirely new opportunities for bioprospecting. The sub-sea floor sediments, the average depth of which is 500 m and which may extend down to several kilometers, are estimated to contain a bacterial biomass that is equivalent to about 10% of the total terrestrial biosphere (364)! Viable bacterial communities have been found in sediments at a depth of 500 m (364), and the linear rate of decline in population sizes indicates that bacteria are present to even greater depths. Sulfate-reducing bacteria recovered from deep sediments have been shown to be barophilic, with maximum growth occurring at the in situ pressure (365), a finding that confirms their deep sea origin. Novel species such as Desulfovibrio profundus (25) have been described from Japan Sea deep sediments, and it is increasingly clear that bacteria of this type are widespread in deep ocean sediments (31).

Symbioses of various types are a distinctive feature of the marine environment. Numerous examples have been described in which archaea, bacteria, and eukaryotic microorganisms have established stable associations with metazoan hosts. Current interest in marine symbiosis includes the question of coevolution of metazoan hosts and their microbial partners, such as hydrothermal vent bivalve-chemolithotrophic bacteria relationships (101). The elegant research of Distel et al. (109), for example, postulates that separate engulfment events involving γ-Proteobacteria led to the entrapment of methylotrophs in methane seep mussels and the association of sulfide oxidizers with other bivalves; in the latter case, the exclusive association of bacterial clades with particular families of bivalves has been uncovered. Subsequently, the coexistence of methylotrophic and sulfide-oxidizing bacteria has been confirmed within single cells of a hydrothermal vent mussel (110). The range of metazoan hosts is extensive (at least seven phyla) and, apart from Bivalvia (mussels and clams), it includes Calcarea (sponges, with archaea, bacteria, cyanobacteria, and microalgae), Anthozoa (scleractinian corals with dinoflagellates and bacteria), Annelida (oligochaetes with bacteria), Polychaeta (vestimentiferan tube worms with bacteria), Crustacea (shrimps with bacteria), and Holothuroidea (sea cucumbers with archaea and bacteria).

The bacterial diversity and biomass of sponges can be considerable, e.g., the sclerosponge Ceratoporella nicholsoni is reported to harbor up to 80 bacterial symbionts (401) and to have nearly 60% of its mesohyl as bacterial biomass (487). Bacteria isolated from C. nicholsoni were not found in the surrounding seawater. Sea squirts (Ascidiaceae) may carry diverse bacterial communities, e.g., 60 strains including 17 actinomycetes associated with Polysyncraton lithostrotum, although details of the tissue distribution are unknown (36). Information is emerging to show that the phylogenetic diversity of endosymbiotic bacteria of specific marine invertebrate hosts can be very wide. The gutless oligochete Olavius loisae associates with one γ-Proteobacterium, one α-Proteobacterium (a novel finding), and a spirochaete (113). The endosymbionts of Pacific vent worms and bivalves include heterotrophs and chemolithotrophs, among which have been described culturable multiple-heavy-metal-resistant strains (238) and uncultured filamentous [var epsilon]-Proteobacteria (194). In contrast, the main producer at mid- Atlantic Ridge vents is an epibiotic monoculture of an [var epsilon]-Proteobacterium associated with the shrimp Rimicaris exoculata (371). The full diversity of vent faunas is also not yet established, as the discovery of new species of shrimp at the mid-Atlantic ridge confirms (308). Monocultures of symbionts also have been reported from certain sponges; the bacterium in question has not been cultured and was not closely related to any major group of the Eubacteria (400). Archaeal symbioses are known to be established with deep-sea holothurians (318) and with a temperate sponge. Cenarchaeum symbiosum represents a new genus of nonthermophilic crenarchaeote which forms a monoculture with the sponge (376).

In this brief selection of marine microbial diversity, we turn finally to actinomycetes. Very few surveys have been directed specifically at marine actinomycetes, but the available evidence points to a wide taxonomic diversity and distributions throughout marine habitats (80, 174, 436, 483). Whether actinomycetes that are recovered from or detected in marine habitats are indigenous remains an open question. To date two species, Dietzia maris and Rhodococcus marinonascens, are regarded as bona fide indigenous marine actinomycetes, but evidence is growing to support the view that others also might be categorized as indigenous. Moran et al. (325) found that Streptomyces species contributed an average of nearly 4% to the bacterial community of in-shore sediments and concluded, moreover, that the wash-in of spores of terrestrial species was not the source of these populations. The activity of these Streptomyces populations in situ was adduced from increases in population sizes and genus-specific rRNA following amendment of sediment cores. In our studies of Pacific Ocean sites, culturable actinomycete numbers were low and usually represented less than 1% of the total culturable bacteria. At some sites (Okinawa Trough, Izu Bonin, and Japan Trenches at depths between 1,393 and 6,455 m), actinomycetes were recovered in the absence of the terrestrial wash-in marker Thermoactinomyces (80), again providing presumptive evidence for indigenous organisms. The question of indigenous or terrestrial wash-in origins of marine actinomycetes is not crucial in the context of searching for novelty. Terrestrial or shallow-sea actinomycetes could have adapted to the high pressure and other selective conditions of the deep seas and undergone considerable speciation: such speciation, or genetic diversity at the infraspecies level, is a potent reason for evaluating these organisms for their biotechnological potential.

Comprehensive surveys of bacteria with high G+C contents in marine environments have not been made, but to date it appears that members of the order Actinomycetales are particularly abundant and widespread. Members of the mycolate taxa Corynebacterium, Dietzia, Gordonia, Mycobacterium, and Rhodococcus have been recovered from various depths in the Pacific Ocean (80), while Micromonospora and Streptomyces species tend to be more abundant in coastal and bathyal sediments (320, 436; S. C. Heald, J. Mexson, M. Goodfellow, and A. T. Bull, unpublished data). Analysis of 16S rDNA sequences indicates the occurrence of novel taxa among these isolates, while PyMS studies have revealed considerable infraspecific diversity within rhodococci and micromonosporae (80, 320). The congruence of PyMS and numerical taxon-derived characterization of a collection of deep-sea Rhodococcus isolates was shown to be very high (ca. 98%) (81). The significance of this result lies in the fact that PyMS characterization of environmental strains can be ascribed directly to the phenotypic variation sought for biotechnology screens. The most recent report from the JAMSTEC group also confirms the presence of culturable actinomycetes in deep-sea trenches (2,759 and 10,897 m) (434). At least one of these isolates is closely related to Dietzia maris, but others are putative new species; a high proportion of the isolates were designated as alkaliphiles (isolated at pH 9.7).

Genomics and Proteomics
Since 1996 the genomes of six archaea and two bacteria isolated from marine habitats have been completely sequenced and published, while a further three and two from each domain, respectively, are in progress. These organisms represent hyperthermophilic and methanogenic Euryarchaeota, thermophilic bacteria, cyanobacteria, and prochlorophytes. Among them are the first archaeon to be sequenced (Methanococcus jannaschii [62]), the first sulfur-metabolizing organism (Archaeoglobus fulgidus [266]), and Prochlorococcus marinus will be the first picoplanktonic bacterium to be sequenced. The information presently available is largely related to general features of the sequences, such as the identification of ORFs, average length, and the annotation of coding sequences (Table 4). Predicted functional homology of putative gene sequences has been verified only for a small number of proteins (e.g., acylamino-releasing enzyme from Pyrococcus horikoshii [234]; a highly heat-stable protein repair enzyme from Thermotoga maritima [230]; and a novel nucleotide triphosphatase from Methanococcus jannaschii [227]). The interrogation of sequence databases as a means of identifying functional proteins needs to be done with care (see reference 396 for a critique and 361 for a cautionary tale). Aurora and Rose (19) recommend that comparisons are better based on predictions of secondary structure rather than on primary amino acid sequence, and they demonstrate the approach with reference to the thymidylate synthase of M. jannaschii; using primary structure alone, the corresponding ORF could not be assigned. In summary, genomics of marine microorganisms is at a very early stage, and it will be some time before biotechnology will exploit these databases effectively. It is clear from Table 4 that in all marine microbial genomes to date, the number of genes found but not matched in the databases is high and in some cases very high, a fact that stimulates the search for novel physiology and biochemistry in these organisms. Comparison of the M. jannaschii genome with that of Methanobacterium thermoautotrophicum (412) reveals considerable divergence between these methanogens; only 352 (19%) of M. thermoautotrophicum ORFs encoded sequences that are greater than 50% identical to M. jannaschii proteins. Quite often genome sequencing has proceeded in advance of the development of adequate cultivation systems (e.g., Methanobacterium thermoautotrophicum [52] and Methanococcus jannaschii [330]), with the result that biochemical investigations have been limited by the inability to produce biomass.
The impact of proteomics on marine microbiology and biotechnology has been negligible. The responses of Pyrococcus abyssi to conditions equivalent to in situ pressure and temperature and to oxygen have been monitored by one-dimensional sodium dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis (PAGE) (307). Although several changes in whole-cell protein profile were observed, 2-D gel resolution of protein profiles must be regarded as essential for revealing adaptive changes and for adequately separating proteins in order to do microsequencing analyses. Recently 2-D PAGE has been used to prepare reference maps of the ultramicrobacterium Sphingomonas sp. strain RB2256 (125a). These maps of exponentially growing batch and chemostat populations will provide benchmarks for investigating the regulation of gene expression under different physiological conditions imposed by the marine environment. It is well known that the exposure of microorganisms to “marine factors” (see below) can elicit major metabolic changes and the synthesis of biotechnologically exploitable metabolites. Consequently the analysis of protein expression profiles and protein signatures under simulated marine conditions offers the possibility of detecting novel metabolism.
A number of recent reviews attest to the enormous diversity of chemical structures, bioactive, and biocatalytic properties of marine microorganisms and invertebrates (37, 94, 124, 125, 240, 502). The majority of studies have been on bioactive chemicals for use in the health care field, but even here the number of reported structures is seriously underestimated because synthesis is frequently induced by a combination of so-called marine factors. These are known to include salinity, micronutrients, copiotrophic and oligotrophic nutrient conditions, pressure, temperature, and extracts of marine animals and algae. Such a range of conditions for provoking wide gene expression is rarely tested during screening operations. Also, we reemphasize the importance of screening infraspecific diversity in this context, a highly topical illustration of which concerns the potent anticancer drug candidate bryostatin 1. Different populations of the cosmopolitan species Bugula neritina produce different bryostatins, and two distinct chemotypes of this bryozoan have been identified (95), only one of which produces bryostatin 1. The question has been raised about the novelty of marine metabolites. That marine invertebrates such as corals, ascidians, and sponges are sources of completely novel chemical structures is unquestionable, and several of these are the subjects of clinical evaluation (131). Evidence also is accumulating that marine bacteria synthesize novel compounds, among which antibiotic, antiviral, anticancer, and pharmacological activities have been described (240), and that marine archaea may be sources of new secondary metabolites (390).

Considerable success in discovering new marine natural products has come from biological and chemical screening procedures. However, an appreciation of the chemical ecology of the marine biota (91, 316) is important in making significant discoveries. We have referred previously to the extent of symbiotic associations in the oceans; other ecological traits such as defense mechanisms, niche protection, signaling, feeding strategies, and the ability to prevent colonization by epibionts can provide clues for the detection of novel natural products. However, several major problems confront research in this field: the concentration of the active compound is often extremely low; chemical synthesis of the compound may be difficult and costly due to its structural complexity; harvesting marine biomass for direct extraction of the compound is almost certainly unsustainable; and the biosynthetic origin of the compound may be equivocal under circumstances in which a symbiosis is involved. Solutions to these problems are found by bringing the producer organism into laboratory culture and optimizing the appropriate fermentation process; identifying the producer organism (by molecular methods if it proves to be unculturable) and then screening members of related taxa and/or using taxon-formulated media and conditions to isolate related taxa; synthesizing chemical analogues; and testing the synthetic capability of the separated symbiont partners.

The monoculture of the symbionts is not invariably a facile task, but it has been achieved. Flowers et al. (137) used density gradient centrifugation to separate Oscillatoria spongeliae from cell populations of the tropical sponge Dysidea herbacea and showing thereby that it was the cyanobacterial partner that produced novel chlorodiketopiperazines. Using the same technique, the Queensland group also reported that it was cells of the sponge (Haliclona sp.) rather than a dinoflagellate symbiont (Symbiodinium microadriaticum) that produced cytotoxic alkaloids, the haliclonacyclamines (157). Bacterial symbionts, identified on the basis of 16S rDNA sequences as Antarcticum vesiculatum and Psychroserpens butonesis, have been shown to be responsible for the neuroactivity of another sponge, Halichondria panicea (370). Progress is being made in the cultivation of sponge cells that maintain the desired physiological state (332), an advance that will also encourage the production of bioactive compounds. In the case of the bryostatins, the endosymbiotic γ-Proteobacterium “Endobugula sertula” has not yet been cultivated and probably requires special conditions for its isolation (203). The importance of studying isolated symbiont partners is reinforced by work on sea hare metabolites. Sea hares (Gastropoda) produce a large diversity of bioactive secondary metabolites, but it appears that many such compounds are probably of cyanobacterial origin (e.g., dolastatin-13 analogue [199]).

Chemical synthesis of some of the novel marine chemotherapeutic candidates is being achieved. Most notable, perhaps, is work on the anticancer bryostatins. Thus, the total synthesis of bryostatin 2 was reported recently (120), while simplified analogues of bryostatins 1 and 10 have been synthesized that retain their protein kinase C-inhibitory activities (480, 481) and which present real opportunities for developing chemotherapeutic agents.

Biocatalysts with novel or unusual properties are regularly reported from marine microorganisms, principally bacteria and archaea (221). Much of the interest in marine enzymes is related to their activity and stability under extreme reaction conditions. For example, the first report of high-pressure enhancement of deep-sea bacterial enzyme activity was published only 5 years ago (257). Alkaline serine protease activity of a Sporosarcina sp. (isolated from the Japan Trench at 6,500 m) was nearly doubled at 60 MPa compared with atmospheric pressure, whereas other proteases were stable but not activated by elevated pressures. There is evidence also that some enzyme production by deep-sea bacteria can be increased by high pressures (268). The interaction between high pressure and high temperature on enzyme activity and stability has been examined by Michels and Clark (321). A protease activity of Methanococcus jannaschii increased up to 116°C and could still be measured at 130°C, and activity and thermostability increased with pressure such that at 50 MPa the reaction rate and stability at 125°C were enhanced 3.4- and 2.7-fold, respectively. Similarly, pressure stabilization of DNA polymerases has been reported for deep-sea hyperthermophiles (431). Another property of biocatalysts that has biotechnological importance is solvent tolerance; several highly tolerant bacteria and yeasts that can degrade crude oil, polyaromatic hydrocarbons, and cholesterol have been recovered from deep-sea sediments (232).

The deep and polar seas are environments from which cold-active enzymes can be isolated, which find applications in low-temperature operations, for example, food and leather processing, cleaning agents, and bioremediation (128, 346). The cold-active α-amylase secreted by the Antarctic marine species Alteromonas haloplanctis has been subject to detailed biophysical study (1, 126) in order to determine which factors confer conformational flexibility and hence efficient catalysis at low temperatures. The wild-type enzyme, which is produced at 0 ± 2°C, has been overexpressed in Escherichia coli (127), where it folds correctly if the temperature is maintained below that causing irreversible denaturation. It has been proposed that this bacterium be reclassified as Pseudoalteromonas haloplanktis (158).

It is pertinent to ask what impact the paradigm shift has had yet on marine biotechnology search and discovery. In terms of the discovery of novel organisms, it is clearly the case that molecular taxonomy and ecology have revealed unsuspected levels of diversity. Nevertheless, the practice of innovative microbiology, as illustrated by the exploration of the “deep biosphere” (364) and the cultivation of fastidiously oligotrophic picoplankters (65), impresses the necessity of not neglecting microbiology per se in favor of molecular approaches. The expression of symbiotic associations is extraordinarily large and diverse in marine environments, and their intensive study is likely to be very productive for biotechnology. The impact of genomics and proteomics on the biotechnological exploitation of marine microbiota has hardly been felt yet, and the first cases of genome sequencing are mostly oriented to hydrothermal vent organisms. Given the preponderance of the marine environment on earth and the importance of its microorganisms in effecting global homeostasis, it is crucial that representatives of the dominant or abundant planktonic taxa be brought into genome programs. The sequencing of bacteria like Sphingomonas sp. strain RB2256 or Cycloclasticus oligotrophus could reveal new understanding of oligotrophy, viability, and how to tackle the problem of difficult-to-culture bacteria. The continual discovery of novel chemistry in marine microorganisms amply justifies the claims for natural-product research developed earlier in this review. While the requirements of the health care sector are likely to remain the principal driver of search and discovery, the marine microbiota presents an enormous diversity of options (e.g., hydrocolloids, polyunsaturated fatty acids, and antifouling compounds) for wider biotechnological development.

Conserving microorganisms

The value of microorganisms in both direct and indirect terms has been stressed on many occasions during the debate on biodiversity and its conservation (58, 83, 438). Because of their direct value as a major resource for biotechnology development, the conservation of microbial gene pools is a crucial issue. In the past this issue has been addressed almost entirely from the standpoint of ex situ conservation. However, it has become increasingly obvious that this strategy on its own is quite inadequate for ensuring conservation in anything approaching a meaningful way. In this section, therefore, we argue for a complementary ex situ-in situ strategy for microbial conservation and urge that a concerted program for in situ conservation be a priority.

How Do We Know What To Conserve?
An answer to this question is entirely dependent on our knowledge of microbial diversity and the threats to its existence. If we do not know the extent of microbial diversity, it becomes axiomatic that we will not know what to conserve. The situation has been stated unequivocally by Jim Staley: “Until microbiologists can provide meaningful estimates of global diversity from studies of selected habitats and a better understanding of the importance of biogeography, it will be fruitless to estimate the degree to which microbial species on Earth are threatened” (420). But as Foissner (138) affirms, “microorganismspresent the greatest challenge to any serious attempt at assessing the overall scale of global species richness.”

Some sense of the magnitude of the problem facing microbiologists can be appreciated by focusing on fungi. If we accept the working figures of 72,000 and 1.5 million for known and total estimated species, respectively, Hawksworth (201) concluded that at the present rate of description, it will take another 888 years for the global inventory of fungi to be completed. Even if we accept Hammond's “moderate” accuracy rating, i.e., within a factor of 5, for fungi (198), the inventorying task would continue until 2188. The situation is likely to be similar with respect to other microbial groups particularly, where the number of taxonomists is known to be low (see, for example, Foissner's comments on soil ciliate diversity [138, 140]). The foregoing, of course, takes no account of infraspecific or genetic diversity, the importance of which for biotechnology has been stressed already.

Is ATBI a Realistic Objective for Microorganisms?
All-taxa biodiversity inventories (ATBIs) have been proposed such that a selection of habitats are subject to intensive investigation in order to make as nearly complete an inventory of species as possible. ATBIs aim to describe all taxa at the species level and the locations where they can be found on subsequent sampling of the habitat site. Such an accounting system may be a reality for the best-known groups of macroorganisms, but is it feasible for hyperdiverse taxa and microorganisms? Even for the former, ATBIs likely will necessitate interpolation between sample sites within an ecosystem.