Christopher T Campbell1 and Kevin J Yarema1
Kevin J Yarema: firstname.lastname@example.org
Glycobiology - the study of carbohydrates in biology - combines expertise in synthetic and analytical chemistry and carbohydrate biochemistry, as well as molecular and cellular biology, to unravel the structural complexity, chemistry, biosynthesis, and biological functions of sugar-bearing biomolecules. Over the past three decades, complex carbohydrates have become widely recognized as more than just an energy source . Indeed, glycosylation has been established as a ubiquitous post-translational modification in higher organisms that enables one protein (or lipid) to function as many, and provides structural diversity that offers an explanation for the unexpectedly low number of genes in the human genome . Complex sugars are major players in numerous biological processes, including developmental biology, the immune response and inflammatory disease, cell proliferation and apoptosis, the pathogenesis of infectious agents including prions, viruses, and bacteria, and a wide range of diseases ranging from rare congenital disorders to diabetes and cancer.
The incredible complexity of a cell's glycosylation machinery and its final products, a vast array of oligosaccharides (Figure 1), provides a research challenge in urgent need of high-throughput, large-scale technologies. Unfortunately, methods for studying and manipulating complex carbohydrates lag behind the tremendous advances made for nucleic acids and proteins . Progress has been sluggish, in part because many biologists were slow to recognize the importance of sugars. But even when prescient researchers sought to uncover the role of glycosylation they were often frustrated by the difficulty of characterizing carbohydrates and the near impossibility of manipulating them with precision in living cells. In this article, we give a brief overview of the overriding factor hindering glycobiology - the incredible complexity of carbohydrates - before describing current technologies available for studying glycosylation and concluding with a guarded, but optimistic, prediction that glycobiology will catch up with other areas of biochemistry and molecular biology largely by virtue of promising large-scale technologies that are now on the horizon.
Although many recent developments in 'glycomics' focus on structural and functional analysis of surface-displayed sugars, the biosynthetic machinery that builds these complex molecules also greatly interests the glycobiologist. We briefly discuss carbohydrate biosynthesis here, both to acknowledge the heroic researchers who laid an impressive foundation without benefit of large-scale technologies and to illustrate the need for high-throughput strategies to accelerate progress. We use the term glycosylation machinery to describe biochemical pathways that convert monosaccharides (for example, dietary glucosamine) into nine different high-energy sugar-nucleotide building blocks (for example, UDP-N-acetylglucosamine (UDP-GlcNAc)) and assemble them into the complex oligosaccharides found on proteins and lipids (Figure 1). Basic components of this metabolic factory were discovered in a painstakingly slow, one-at-a-time process over many decades (for a detailed perspective, see the fascinating historical overview by Saul Roseman ). Traditional biochemical studies from the 1950s to the 1970s identified many small-molecule metabolites and characterized the enzymatic activities that link them into metabolic pathways. Once metabolites were arranged into putative pathways, the next requirement was to match genes with enzymatic activities; this formidable task was tackled, primarily one gene at a time, by elegant but time-consuming methods such as the forward genetic screens developed in the 1970s, and by the DNA cloning and recombinant gene expression strategies that became routine in the 1980s . More recently, RNA-inhibition techniques have begun to yield insights into glycosylation by downregulating individual genes .
Around 2% of human genes are involved in glycosylation, as judged from the most recent developments in large-scale biology, primarily the sequencing of the human genome coupled with predictive algorithms for gene function. This information, along with 'metabolomic' methods for large-scale characterization of small-molecule metabolites , has sped up the placement of the finishing touches on the framework of the glycosylation machinery. Almost all its metabolic components are known and have been assembled into well defined pathways, as can be seen by following the links for 'Carbohydrate metabolism' and 'Glycan biosynthesis and metabolism' in the Kyoto Encyclopedia of Genes and Genomes, KEGG . A static picture of glycosylation does not, however, reflect dynamic moment-by-moment, developmental, and disease-related metabolic fluctuations, nor does it provide much insight into subcellular organization and organelle topography, which are critical factors in shaping final oligosaccharide structures . In the future, computational 'systems biology' promises to bring the glycosylation machinery to life  and thereby offers insights into repairing glycosylation abnormalities associated with widespread diseases, including diabetes  and cancer .
Structures of sugars have long fascinated chemists and biologists, beginning with Emil Fischer's landmark efforts to decipher the isoforms of hexoses more than a century ago . Since then, even with modern techniques, biologists have been outpaced by the difficulty of obtaining a glycosylation profile - the specific complement of glycoconjugates present - of even a single cell. To illustrate that there is no simple task in carbohydrate analysis, Figure 1 shows a few biologically significant glycoconjugates. Even the addition of a single N-acetylglucosamine moiety to a protein to give the O-GlcNAc modification, which regulates numerous biochemical pathways by acting in a yin-yang manner with phosphorylation  (Figure 1c), is complicated by its occurrence on hundreds of different cytosolic and nuclear proteins, and on multiple sites within a single protein. The various biological activities of glycosphingolipids, relatively simple sugar-bearing biomolecules exemplified by the ganglioside GM3 (Figure 1d), demonstrate that very subtle changes to sialic acid (N-acetylneuraminic acid or Sia), an unusual nine-carbon sugar found in more than 50 different chemically distinct forms , can regulate apoptosis, senescence, and proliferation, thereby highlighting the need for careful analysis of fine structural details.
Moving to larger glycoconjugates, prions are glycosylated proteins that possess only two sites where oligosaccharides attach (Figure 1a). Even so, any one of several dozen different sugar chains can reside at either site; consequently, prions exist as hundreds of distinct entities. The discovery of the influence of carbohydrates on prion infectivity and on the development of spongiform encephalopathies [14,15] underlines the importance of fully defining structural heterogeneity of this kind. As a final example, the heavily glycosylated cell-surface glycoprotein CD34 (Figure 1a), found on hematopoietic cells and epithelial cells, serves as a developmental marker for hematopoietic cells, mediates leukocyte homing, and contributes to cancer metastasis. It bears 20 or more separate oligosaccharide chains , implying that, if ten different oligosaccharide structures randomly occur at each site (a conservative estimate), 1020 different forms of CD34 can exist and each of the approximately 104 to 105 copies of this protein found in a typical cell has a reasonable probability of being unique.
Only recently has methodology advanced sufficiently to obtain complete glycosylation profiles of glycoconjugates such as prions or CD34 (Figure 2). To briefly summarize today's technology, a plethora of mass spectrometry (MS) methods are becoming affordable and user-friendly [17,18], pulsed-amperometric detection methodology is making the separation of carbohydrates by high-pressure liquid chromatography (HPLC) attractive, increasingly sensitive nuclear magnetic resonance (NMR) technology is allowing this powerful technique of structure determination and identification to be applied to glycoconjugates isolated from natural sources, and lectins are finding new uses as detection agents for carbohydrates in chromatography and protein arrays [19-21]. Excellent reviews provide a detailed picture of how different methodologies are coalescing into a powerful set of tools for sophisticated and highly sensitive investigation of glycoconjugates [22,23].
While the isolation and characterization of highly complex glycoproteins are impressive feats, the sobering reality is that only a handful of the thousands of different glycoconjugates in the human body have been analyzed so far, which leaves the enormous carbohydrate diversity of even a single cell unknown in molecular detail. To further complicate matters, glycosylation profiles are not static, but rapidly change as cells differentiate, undergo apoptosis, or become diseased. Today's technologies are inadequate for determining the dynamic glycosylation profile of a cell and fall well short of the ultimate goal of glycomics - the evaluation of an entire organism. To dispel the gloom, however, underlying technologies for innovative, large-scale glycomic techniques are developing rapidly - both by bringing new techniques to carbohydrate analysis and by refining established methods to increase throughput. These two approaches, exemplified by array-based technologies and the automation of mass spectrometry, respectively, are discussed below.
The success of DNA microarrays, on which thousands of discrete interactions are observed at once, has spawned array-based methods for confronting almost every problem. Carbohydrate analysis is no exception, and two array-based strategies are now being pursued. The more mature approach - which has reached the point of using robotic microspotting - involves attaching hundreds of different oligosaccharides of known composition to a surface, and is used to identify binding partners (Figure 3) [24-26]. This approach reproduces the 'glycocode' found on the cell surface and helps determine how biological systems decode the vast information-carrying capacity of carbohydrates . In a second type of array, carbohydrate-binding proteins such as lectins are arrayed on the surface. This technique, made possible by protein-array printing techniques that avoid altering the recognition capacity of proteins, has recently been demonstrated in concept for a modestly sized lectin array . In the future, when the hundreds of lectins now available, as well as the growing number of antibodies that bind specific glycan structures, are incorporated, such arrays will facilitate the rapid profiling of cellular glycosylation states.
Conventional methods, including chromatography or two-dimensional gel electrophoresis, used in proteomics to separate proteins isolated from a cell or tissue (Figure 2), are rapidly and effectively being adapted for oligosaccharide characterization . In contrast to microarrays, identification is not inherent in these techniques, necessitating a reliance on mass spectrometry for identification of glycoconjugates after separation; mass spectrometry is extremely sensitive, allowing minute amounts of samples isolated from biological samples or purified by capillary electrophoresis or two-dimensional gels to be identified successfully . Unfortunately, the need to isolate individual oligosaccharides by chromatography or electrophoresis prior to mass spectrometry, and the lack of automated identification algorithms, limits the throughput of these methods, leading to techniques such as fluorescence differential gel electrophoresis (DIGE ), that do not characterize all products and settle for the less ambitious goal of identifying a limited number of molecules that differ between two samples (for example, healthy versus diseased tissue) . To overcome the bottleneck of identification, much effort is being put into developing automated, high-throughput computational tools for the interpretation of glycoconjugate mass spectra [23,32].
Chemical tools have been vitally important for the development of large-scale glycomics. These range from automated synthesis  to development of chemoselective coupling reactions  that facilitate attachment of oligosaccharides to arrays [35,36] and underlie high-sensitivity methods for isolating sugars from biological extracts [29,37]. Another increasingly important contribution of chemists is the synthesis of abiotic monosaccharide analogs that are used in oligosaccharide-engineering strategies based on metabolic substrates. This approach exploits the unusual permissiveness of certain biochemical pathways involved in carbohydrate biosynthesis to accommodate non-natural metabolic intermediates . By intercepting a targeted pathway with an analog, it is possible to install abiotic, chemically distinct sugars into mature glycoconjugates. The incorporation of azide-modified analogs of sialic acid into the B-lymphocyte surface glycoprotein CD22, an important modulator of B-lymphocyte activity, provided a recent example of this technique's ability to discover new insights into biological roles of glycosylation: photoaffinity cross-linking of the azide-modified sialic acid allowed in situ identification of a potentially important modulator of B-cell activity - previously unappreciated homomeric binding among neighboring CD22 molecules .
An adaptation of the tagging-via-substrate (TAS) proteomics approach  is now transforming metabolic oligosaccharide engineering into a high-throughput technology. TAS technology involves the biosynthetic incorporation of an azide functional group into the design of a basic building block such as an amino acid  or monosaccharide , followed by isolation of labeled biomolecules via this chemical tag. In a pioneering study, N-azidoacetylglucosamine, an analog of GlcNAc, was used to tag O-GlcNAc-labeled proteins . The subsequent identification of around 25 O-GlcNAc-modified proteins in the brain established a biochemical link between O-GlcNAc modification and neuronal signaling, synaptic plasticity, and gene expression . Of equal importance, this study provides a precedent for expanding the TAS strategy to other tissues and for applying it to uncover subtle metabolic differences between healthy and diseased cells.
In conclusion, the hope for an increased pace of discovery in glycobiology, where progress has lagged because "carbohydrates are complex" , lies in several large-scale technologies now in the early stages of development. Continued progress is not without its problems. For example, the current versions of arrays contain only a very small fraction of all the carbohydrates found in nature . A second issue is that the exact presentation of oligosaccharides is often important to achieve the 'cluster glycoside effect', whereby carbohydrate-binding interactions are specified by multiple simultaneous interactions that achieve both specificity and avidity [44,45]. Today's methods of attaching carbohydrates to an array, whereby they are spotted onto inflexible flat surfaces that have very different biophysical properties from the flexible peptide backbone of, say, CD34 (Figure 1a) or the spherical geometry of highly branched dendrimers , are unlikely to faithfully reproduce physiological binding.
Other nascent high-throughput methods, such as the automation of mass spectrometry, must also overcome significant barriers. The use of mass spectrometry in glycomics, for instance, is hampered in various ways: glycan databases are incomplete; that is, many of the oligosaccharides found in nature have not yet been isolated and characterized by mass spectrometry; the structural complexity of oligosaccharides limits current identification algorithms to structures of less than ten monosaccharides; and the identification of the correct oligosaccharide from many isomeric options remains a challenge . Mass spectrometry must also overcome its aversion to sialic acids. In the past, this structurally diverse , negatively charged sugar has typically been removed to simplify analysis; the critical role of sialic acid in modulating the bioactivity of GM3 (Figure 1e) is but one of numerous examples that insist that this sugar cannot continue to be ignored. To end optimistically, these challenges, although appearing daunting today, will be overcome in the near future - within two to three years in one prediction  - if scientific curiosity and the potential multibillion dollar market for therapeutic glycoproteins continue to accelerate the current pace of technological development.
Systems and molecular complexity in glycobiology. (a) The
glycosylation machinery consists of an intricate network of metabolic
pathways that interconvert monosaccharides and produce high-energy
sugar nucleotides (full details of the pathways are available in ).
The hexosamine pathway  that converts glucosamine (1) to UDP-N-acetylglucosamine
(UDP-GlcNAc) (2) is highlighted in blue. The versatility of the
glycosylation machinery is epitomized by the conversion of UDP-GlcNAc
into N-acetylmannosamine (ManNAc) (3), a sugar that is
metabolically converted to CMP-sialic acid (CMP-Sia; 4) by the pathway
highlighted in red. UDP-GlcNAc and CMP-Sia, together with seven other
sugar nucleotides, are transported into the endoplasmic reticulum (ER)
and Golgi apparatus (5), where they are used for the production of
complex oligosaccharides (6) that comprise the glycosylation profile of
the cell surface. This profile is made up of proteins (such as the
prion protein and CD34, shown here) and glycolipids such as ganglioside
GM3, a glycosphingolipid. Sialic acid (Sia) is a ubiquitous terminal
modification. (b) The chemical structures of glucosamine, UDP-GlcNAc, UDP-ManNAc, and CMP-Sia. (c) As
well as being used to build complex oligosaccharides, UDP-GlcNAc is a
high-energy building block that provides the GlcNAc residue required
for O-GlcNAc protein modification in the cytosol . (d) Slight
modifications to the chemical structure of CMP-Sia elicit profound
changes in biological activity. The membrane glycosphingolipid
ganglioside GM3 (center) is converted to pro-apoptotic gangliosides GD3
by addition of Sia (top), whereas deacetylation of GM3 yields de-N-acetyl GM3, which has a growth stimulatory effect.
(Click image to enlarge)
Conventional low-throughput glycoconjugate characterization and steps
that will improve throughput. Current strategies for oligosaccharide
identification include multiple time-consuming steps including, but not
limited to, (1) isolation of individual glycoconjugates, such as prions
or CD34 (see Figure 1), from a cell or tissue; (2) the detachment and
purification of each oligosaccharide from a particular glycoconjugate;
and (3) a one-at-a-time structural characterization and identification.
Each of these steps currently requires multiple procedures and method
of analysis , as illustrated in the boxes for steps (1) and (3).
Streamlined methods now under development, such as (4) the coupling of
isolation by glycoblotting with identification by mass spectrometry
(MS) , and automated interpretation of spectra , are also
shown. These methods, along with array-based technologies (see Figure
3), offer hope for high-throughput glycan characterization in the near
(Click image to enlarge)
Oligosaccharide and carbohydrate-binding protein arrays. (a) Oligosaccharide
microarrays are used to detect and characterize carbohydrate-binding
proteins. They are constructed by (1) spotting known oligosaccharides
(either synthetic or naturally isolated) onto a solid surface such as a
treated glass slide in a predetermined array. Whole cells can be bound
to the array (2), but it is more common to first fractionate cells or
tissues to isolate (3) putative carbohydrate-binding proteins. (b) Arrays
of known carbohydrate-binding proteins (either lectins or monoclonal
antibodies) are used to detect and characterize oligosaccharides. They
are produced by printing spots of the proteins onto a suitable surface
(1). Again, whole cells (2) can be bound to the array, but more usually
(3) their cell-surface oligosaccharides will be isolated and used. Both
types of array can be used for a variety of purposes.
(Click image to enlarge)