From the Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada and Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
Serum proteomic profiling, by using surfaced-enhanced laser desorption/ionization-time-of-flight mass spectrometry, is one of the most promising new approaches for cancer diagnostics. Exceptional sensitivities and specificities have been reported for some cancer types such as prostate, ovarian, breast, and bladder cancers. These sensitivities/specificities are far superior to those obtained by using classical cancer biomarkers. In this review, I concentrate more on questions that cast doubt on the results reported and propose experiments to investigate these questions in detail, before the technique is used at the clinic. It is clear that the method needs to be externally and thoroughly validated before clinical implementation is warranted.
Molecular & Cellular Proteomics 3:367-378, 2004.
Our current efforts to combat cancer are not very successful. Despite the recent spectacular advances in molecular medicine, genomics, proteomics, and translational research, mortality rates for the most prevalent cancers have not been significantly reduced. Some of the best available options to combat cancer include primary prevention, earlier diagnosis, and improved therapeutic interventions. We are now witnessing the development of new drugs against cancer that are based on rational instead of empirical designs. There is hope that some of these drugs will prove to be more effective at the clinic than older generations of medicines. In terms of primary prevention, we do not as yet have at hand any robust strategies, because the mechanisms of cancer initiation and progression are still largely unknown.
One of the best strategies to combat cancer now is by early diagnosis and administration of effective treatment (1). Another approach includes close monitoring of the cancer patient after initial treatment (usually surgery) to detect early relapse and then prescribe additional therapy. A third valuable approach would be the stratification of patients into subgroups that respond better to different types of treatment (individualized therapy). Medical imaging and serum or tissue biomarkers are valuable tools for monitoring these patients in order to optimize clinical outcomes.
In this review, I will concentrate on mass spectrometry as a diagnostic and cancer biomarker discovery tool. Much has been published on this technology, and excellent reviews have already been prepared (2–12). My presentation will be biased toward underlining potential limitations that have not been adequately addressed in the already existing extensive literature.
Mass spectrometry has been used as a diagnostic tool in clinical laboratories for many decades. This technology has been coupled with gas chromatography (GC/MS) 1 and has been used with success for the identification and quantification of relatively small molecules (with molecular mass could be highly informative in newborn screening programs (13), toxicological and forensic applications (14), for delineating various types of inborn errors of metabolism (15), for detecting doping of athletes (16), etc. Over the last 15 years, we have seen a resurgence of this technology for studying larger molecules such as nucleic acids and proteins. These new applications became possible mainly due to the development of novel methodologies to effectively volatize and ionize proteins and nucleic acids, by using various chemicals (matrices) and lasers (e.g. matrix-assisted laser desorption/ionization, MALDI) or electrospray ionization (ESI). The ability to measure with high accuracy mass-to-charge ratio, providing spectra of very high resolution, and the development of tandem mass spectrometry (MS/MS) to obtain de novo protein sequence information has further enhanced the applications of this technology in proteomics. Coupling of mass spectrometers to liquid chromatrography (LC/MS) further expanded the discriminatory power of the method. Mass spectrometry is now one of the most powerful proteomic tools (17). Even more spectacular advances in mass spectrometry should be expected, with further improvements in resolution and detectability. With this in mind, it is not surprising that many scientists have decided to use mass spectrometry either as a diagnostic tool or as a cancer or other disease biomarker discovery platform (2–12).
I will need to emphasize at this point that the critical discussion to follow is not directed against either mass spectrometry or to the field of proteomics in general. In fact, these methods and fields of investigation, used appropriately, may indeed succeed in discovering new diagnostic modalities for cancer and other diseases, as well as contribute to the better understanding of the pathogenesis of such diseases. The Human Proteome Organization (HUPO, www.hupo.org) is focusing on the identification of large numbers of proteins in complex mixtures, including serum and other biological fluids (17). It is expected that these efforts will finally lead to the identification of new potential biomarkers for cancer and other diseases. HUPO also intends to standardize the methodology so that the results obtained with these techniques are robust and reproducible among laboratories.
Most of the discussion below will focus on one proteomic platform used extensively in diagnostics, known as surface-enhanced laser desorption/ionization-time-of-flight (SELDI-TOF) mass spectrometry. This technique is based on the pretreatment of a biological fluid or tissue extract with various proteomic chips, performing protein extractions based on hydrophobic, ion-exchange, metal binding, or other interactions. The bound proteins are then subjected to mass spectrometric analysis. The derived information can be used for either diagnosis or for identifying potential biomarkers that could then be further validated with alternative technologies. These issues will be discussed in detail below.
A handful of cancer biomarkers are currently used routinely for population screening, disease diagnosis, prognosis, monitoring of therapy, and prediction of therapeutic response. Some established biomarkers are listed in Table I. Although it is highly desirable to have biomarkers suitable for population screening and early diagnosis, none of the biomarkers listed in Table I has adequate sensitivity, specificity, and predictive value for population screening. Even prostate-specific antigen (PSA), which has been approved for population screening by the Food and Drug Administration, is not universally accepted for this application. The reasons for biomarker failure in population screening settings are multiple and fall outside the scope of this review. It will suffice to mention that poor specificity leads to many false-positive results. In population screening, disease prevalence is another important parameter; diseases of low prevalence (like ovarian cancer) will require outstanding diagnostic test specificity (>99%) for the test to be considered viable (18). It can be concluded that none of the individual biomarkers currently at hand can fulfill the requirements of population screening for cancer. Biomarkers are clinically recommended mainly for monitoring the effectiveness of therapeutic interventions. Some biomarkers are also invaluable tools for early diagnosis of cancer relapse, which may trigger additional treatments before the appearance of clinical symptoms.
Most of the currently used cancer biomarkers were discovered following development of novel analytical techniques, such as immunological assays and the monoclonal antibody technology. It was then found that these molecules were elevated in biological fluids from cancer patients in comparison to normal subjects. Many cancer biomarkers were discovered by immunizing animals with extracts from tumors or cancer cell lines, and then screening for monoclonal antibodies that recognize "cancer-associated" antigens. More recently, and with the completion of the Human Genome Project, many researchers hypothesized that the best cancer biomarkers will likely be secreted proteins (21); about 20–25% of all cell proteins are secreted. However, this is not an absolute requirement because a number of classical cancer biomarkers (e.g. CEA, Her2-neu) are cell membrane-bound, but their extracellular domains are shed into the circulation. Other groups, including our own, are using bioinformatics, such as digital differential display and in silico Northern blotting, to compare gene expression between normal and cancerous tissues to identify overexpressed genes (22). Although one of the prevailing hypotheses in new biomarker discovery is that the most promising biomarkers should be overexpressed proteins, this is not generally true for some of the best known cancer biomarkers such as PSA (23). Overexpressed genes are now identified experimentally by using microarrays. Some of these genes have been proposed as candidate cancer biomarkers (24–26). Despite this reasonable hypothesis, very few cancer biomarkers have been discovered by using this approach (26, 27). We followed another approach, in which we postulated that if a molecule is already a known-cancer biomarker, members of the same family of genes/proteins may also constitute novel biomarkers. We have since shown that kallikreins, a group of serine proteases with high homology at both the DNA and protein levels (this family includes PSA), are candidate biomarkers for ovarian, prostate, and breast cancers (28, 29).
Over many years of developing cancer biomarkers, we came to understand that a molecule may become a practical serological biomarker if it has certain characteristics, i.e. it is a secreted or shed protein and has the ability to diffuse into the circulation during tumor development and progression, through either angiogenesis or invasion of surrounding tissues and vasculature by cancer cells. Preferably, such proteins should be stable (not degraded) and not bound to inhibitors that could interfere with their measurement. The experience with the classical biomarkers has taught us many lessons on the dynamic relationships between the patient and biological phenomena related to biomarkers such as appearance in the circulation, cleavage, binding to serum proteins, degradation, modification, elimination half-life, etc. In this review, I will use PSA as an example to compare what we know from such molecules with mass spectrometric approaches for diagnostics.
Petricoin et al. have pioneered the use of mass spectrometry as a diagnostic tool (30). They suggested that this approach represents a paradigm shift in cancer diagnostics, based on complex mass spectrometric differences between proteomic patterns in serum between patients with or without cancer identified by bioinformatics. Their premise is that no matter what the nature of these molecules are, their potential to discriminate between these two conditions should be further exploited. The central hypothesis of this approach is as follows: protein or protein fragments produced by cancer cells or their microenvironment may eventually enter the general circulation. Then, the concentration (abundance) of these proteins/fragments could be analyzed by mass spectrometry and used for diagnostic purposes, in combination with a mathematical algorithm (30).
The vast majority of the currently available data have been produced by using the SELDI-TOF technology, marketed by Ciphergen Biosystems (Fremont, CA). Ciphergen claims that over 200 papers have already been published with this technology. The types of cancers that have been examined include ovarian, prostate, breast, bladder, renal, and others, and the biological fluids analyzed include serum, urine, cerebrospinal fluid, nipple aspirate fluid, etc. The apparent successes with this technology have been recently reviewed by many investigators (2–12). In general, it has been suggested that this technology can achieve much higher diagnostic sensitivity and specificity (approaching 100%) in comparison to the classical cancer biomarkers (31). The technology’s potential has been expanded to other diseases such as Alzheimer’s disease, Creutzfeldt-Jakob disease, renal allograft rejection, etc. (32–34).
The analytical procedure with this technology involves a few common steps. The biological fluid of interest is first interacted with a protein chip that incorporates some kind of an affinity separation between "noninformative" and "informative" proteins. After washing, the immobilized (and fortunately mostly informative) proteins can be studied by using SELDI-TOF mass spectrometry. Two types of data have been reported in the literature: 1) discriminating peaks of unknown identity that are different in amplitude (increased or decreased) between normal individuals and patients with cancer; and 2) data in which at least some of these peaks have been positively identified (see below). Computer algorithms have been used to analyze these multidimensional data to demonstrate that a pattern consisting of several peaks (from tens to thousands) is sufficiently different between the two groups of subjects. In this review, I will not comment much on peaks that have not been positively identified, because nothing is known about them, except that their heights go up or down in the disease state. I will use the few positively identified molecules to draw comparisons between them and the classical cancer biomarkers.
The extraordinary data presented in the literature with this new approach were welcomed by scientists, the press, the public, and even by politicians (31, 35). This technology is now seen as the most promising way of diagnosing early cancer (35). Clinical trials are now underway and will reveal, in a blinded fashion, if these data can be reproduced and if they are robust enough for clinical use. In the following paragraphs, I will concentrate on issues that have not been adequately addressed and raise concerns that at least some of this data may not be accurate or expected on theoretical grounds.
The use of SELDI-TOF technology as a cancer biomarker discovery tool (as opposed to a cancer diagnostic tool) is straightforward. The discriminatory peaks, if positively identified, may represent molecules that could be measured with simpler and cheaper techniques for the purpose of diagnosing cancer. For example, some investigators postulate that such molecules may be routinely quantified by using enzyme-linked immunosorbent assay (ELISA) technologies. In practice, very few, if any, of the SELDI-TOF identified novel candidate biomarkers have been validated by using alternative technologies.
Liotta et al. hypothesized that the relative cellular abundance of tens of thousands of different proteins, along with their cleaved or modified forms, is a reflection of ongoing physiological and pathological events. They further postulate that as tissues are perfused by blood and lymph, proteins and protein fragments, passively or actively, enter the circulation. Thus, the complex chemistry of the tumor-host microenvironment should generate unique signatures in the blood microenvironment. I agree with this statement. The major question here is if these putative proteomic changes in the blood can be captured by the SELDI-TOF technology, as applied in the published papers. In my opinion, it is highly unlikely that a small and localized tumor will be able to modify the serum proteomic pattern to a degree that can be recognized by the SELDI-TOF technique. As I will further elaborate later, SELDI-TOF, and other proteomic technologies based on mass spectrometry, may not be sensitive enough to detect the low-abundance "signature" molecules that are released by a few tumor cells or their microenvironment into the circulation. I do believe that informative molecules originating from tumor cells or their microenvironment may indeed be present in biological fluids and that their identification may lead to the discovery of potential new biomarkers.
The identification of these molecules will likely require ultrasensitive techniques capable of measuring concentration ranges 10-12 mol/liter or lower (far lower than those achieved by current SELDI-TOF protocols, see below).
An alternative hypothesis for the observed differences in proteomic patterns in serum between normal individuals and cancer patients may be the detection of high-abundance molecules that are not produced by the tumor cells but rather represent epiphenomena of tumor presence. For example, it has been postulated by this author that at least some of the detected molecules represent acute-phase reactants that are released into the circulation by the liver and other organs (36, 37). It has been shown as early as 30–40 years ago that such molecules are not specific for the presence of any cancer, and for this reason they have not been used in clinical practice for cancer diagnosis or monitoring, although their concentrations may be elevated in serum of some cancer patients (38).
As it currently stands, SELDI-TOF technology requires pretreatment of a small amount of serum with SELDI protein chips. These protein chips have either 8 or 16 spots containing a specific chromatographic surface. Currently available surfaces are based on either hydrophobic, ion-exchange, metal affinity, or normal phase chromatography. It is also possible, but not widely utilized at the moment, to immobilize more specific reagents such as antibodies, receptors, DNA, etc. For diagnostics, one would expect that the discriminatory peaks for one cancer type can be identified by using preferentially one of these surfaces. After surveying the literature for prostate and ovarian cancer diagnostics, I identified five papers that used SELDI-TOF technology for prostate cancer (39–43) and two papers for ovarian cancer (30, 44). The different groups found that different proteomic chips may be optimal for disease diagnosis. Metal affinity (IMAC-Cu), hydrophobic (C16 or H4), or weak cation exchange (WCX2) chips were used for prostate cancer. In two of the studies (40, 41), the same mass spectroscopic data were used by the same group but different bioinformatic tools were employed to analyze them. A summary of the prostate and ovarian cancer studies are presented in Tables II and III. The following points are relevant. The distinguishing peaks between cancer and non-cancer patients are very different between the various groups. In fact, none of the distinguishing peaks between the four different research groups for prostate cancer agree with each other. The only agreements were two peaks for distinguishing non-cancer versus cancer from the same group of investigators and the same datasets (40, 41). A different bioinformatic analysis revealed other discriminatory peaks between the two studies from the same group (41). Similar discrepancies are seen with ovarian cancer (Table III). How could these discrepancies be explained? One hypothesis is that serum may indeed contain a huge number of discriminatory molecules between cancer and non-cancer patients and that the chance of two groups finding the same discriminatory peaks is very small. Another explanation may be methodological differences in which different chips were used to immobilize the candidate discriminatory peaks. This is not a likely hypothesis because Banez et al., Adam et al., and Qu et al. (40–42) used the same protein chip (IMAC-Cu), yet they came up with different discriminatory peaks (Table II). In my opinion, it will be highly unlikely that a small, localized tumor and its microenvironment will generate such diverse populations of informative peptides/proteins in the circulation. Another important difference, displayed in Table II, refers to the diagnostic sensitivity and specificity. Contrast to Qu et al., Petricoin et al., and Adam et al., Banez et al. reported that at least with the IMAC-Cu proteomic chips, their sensitivity and specificity was only 66 and 38%, significantly inferior than the other three studies.
What needs to be done to investigate these discrepancies further? First, the experiments should be independently repeated by other laboratories. Second, these validation studies should be done with the older (Ciphergen) and with higher resolution instruments, various batches of proteomic chips, and by using different bioinformatic tools. Also, internal controls (such as already validated classical discriminatory cancer biomarkers) should be incorporated to validate the actual analytical sensitivity of the technology (see below).
We currently have at hand validated cancer biomarkers that can reasonably distinguish between cancer and non-cancer patients. For example, PSA can be used as a biomarker for a group of patients without cancer (PSA histologically confirmed prostate cancer and PSA > 10 µg/liter. Because free PSA and complexed PSA have molecular masses of 30 kDa and 100 kDa, respectively, these masses are well within the current capabilities of mass spectrometers (43). Validation of this technology will be highly enhanced if it is shown that one of the discriminatory peaks identified in prostate cancer is PSA and its subfractions. The same comment applies for other validated cancer biomarkers. Surprisingly, in none of the published studies with breast, prostate, or ovarian carcinoma have the classical cancer biomarkers been identified. I believe that the inability to identify these classical cancer biomarkers is due to the low sensitivity of the SELDI-TOF approach. Until validated serum internal controls are used with this technology, the results obtained, and the sensitivity of the method, should remain in question.
BIAS OF SELDI-TOF TECHNOLOGY TOWARD HIGH-ABUNDANCE MOLECULES
The current method of performing SELDI-TOF experiments with unfractionated serum includes exposure of serum to the protein chip, washing, and then identification of the immobilized molecules by using MALDI-TOF instrumentation. The solid phases currently in use (mentioned earlier) are not specific for any type of protein. Because serum contains a tremendous array of extremely high-abundance (e.g. albumin) and very-low-abundance molecules (range of concentrations vary by a factor of 106- to 109-fold) (45), it will be highly unlikely that the most informative, low-abundance molecules will be able to immobilize on such chips. Simply, they will likely be competed out by high-abundance, noninformative molecules. For example, in serum, the PSA concentration in healthy males is 1 µg/liter, whereas the total protein concentration is in the order of 80,000,000 µg/liter. When proteins are exposed to the chip, each PSA molecule (or other molecules of similar abundance) will encounter competition for binding to the same matrix by millions of irrelevant molecules. It would thus seem very unlikely that molecules with very low abundance will ever be detected by this method. The experiments to prove or disprove these proposals have been previously outlined by this author in a separate editorial, but to my knowledge they have not as yet been reported (46).
A previous report by Wright et al. claimed that four classic prostatic biomarkers, including free and complexed PSA, could be detected by mass spectrometry in various biological fluids and tissue extracts, including seminal plasma, prostatic extracts, and serum (47). However, the masses assigned to free or complexed PSA may have originated from other molecules with a similar molecular mass. Furthermore, the presence of other molecules, such as salts, could cause a mass shift, thus complicating the interpretation further. These authors, in their efforts to show a quantitative relationship between peak area and PSA concentrations, constructed linear calibration curves but at PSA concentrations between 1,000 and 50,000 µg/liter. Such concentrations are rarely or never seen in clinical practice, even in sera from patients with highly metastatic prostate cancer. On the same point, other authors reported prostate-specific membrane antigen (PSMA) concentrations in serum (this is another prostatic-specific molecule) by using a SELDI-TOF approach in combination with an immobilized antibody. The reported concentrations of PSMA in serum (100–500 µg/liter; 500 times higher than the secreted protein PSA) are surprisingly high and need to be validated by ELISA-type methodologies, given that this molecule is a membrane-bound protein (48).
I have compiled a list of SELDI-TOF-identified molecules in serum that are thought to be discriminatory between normal stage and cancer (Table IV). Clearly, these candidate serum biomarkers are very-high-abundance molecules known to be produced mainly by the liver.
For example, Zhang et al. (49) identified three discriminatory peaks in ovarian cancer: apolipoprotein A1, transthyretin (pre-albumin) fragment, and inter--trypsin inhibitor. Ye et al. discovered haptoglobin- subunit for ovarian cancer (50), and Hlavaty et al. discovered vitamin D-binding protein for prostate cancer (51). More recently, Cho et al. identified serum amyloid A protein for nasopharyngeal carcinoma (52). Table IV presents the comparative serum concentrations of these putative tumor markers and of classical tumor markers, such as -fetoprotein and PSA.
A number of the "new" tumor biomarkers discovered by SELDI-TOF technology were, in fact, originally identified more than 30 years ago by classical techniques (e.g. haptoglobin- subunit for ovarian cancer) (53) but were deemed useless for clinical diagnosis because of their low sensitivity and specificity (54, 55). Just to illustrate this point further, I performed a MEDLINE search using the keywords "haptoglobin" and "cancer" and identified 571 papers published from 1965 to 2003. Haptoglobin was reported since 1966 to be elevated in the following malignancies: leukemias, Hodgkin’s disease, Burkitt’s lymphoma, multiple myeloma, neuroblastoma, melanoma, glioma, and cancers of the cervix, genitals, stomach, breast, liver, kidney, ovaries, lung, endometrium, colon, prostate, gallbladder, bladder, head and neck, brain, and larynx. The same comments applies to serum amyloid A protein (52). It is clear that haptoglobin- subunit or other acute-phase reactants are not specific cancer biomarkers.
IDENTITY AND ORIGIN OF DISCRIMINATORY PEAKS
Immediately after publication of the first report of SELDI-TOF-based diagnostics for ovarian cancer (30), I urged the authors to positively identify the discriminatory peaks so that their serum elevation or decrease in cancer is better understood (36). Efforts to identify these discriminatory peaks have been minimal. Liotta et al. suggested that knowledge of peak identity should not be essential and that this technology represents a new diagnostic paradigm (56). To date, the identity of the five "discriminatory" peaks for ovarian cancer remains elusive (30). Fortunately, HUPO has currently identified as one of their goals to characterize the serum proteome. Also, newer instrumentation is now capable of identifying the discriminatory peaks by using tandem mass spectrometry. As more peaks are positively identified, we will be able to further understand what this technology really detects and if indeed the identified molecules can be confirmed independently to be potential cancer diagnostic markers by using other methodologies (e.g. ELISA). As mentioned, currently identified molecules by this technology are of very high abundance, are mostly produced by the liver, and many are acute-phase reactants (Table IV).
TECHNICAL CAVEATS AND METHODOLOGICAL DETAILS
It is important to understand how this method is used in order to identify possible deficiencies. Routinely, 1–3 µl of serum, either diluted or undiluted, is added to the activated surface of the protein chip and incubated. The chip is then washed, air-dried, and treated with an ultraviolet-absorbing agent (e.g. sinapinic acid, also known as "matrix") and then dried. The chip is then inserted into the mass spectrometer for SELDI-TOF mass spectrometric analysis. A critical question here is if the protein chip has the capacity to bind quantitatively all proteins present in the sample. Clearly, the answer is no. Then, what binds to the chip will depend on the total protein of the sample, the abundance of various competing proteins for the solid phase, and the properties of the chip (such as hydrophobicity, ion-exchange capacity, metal binding, etc.). Without knowing the abundance of competing proteins, and given the limited capacity of the matrix, what is finally retained on the surface may be quite variable between different clinical samples. It is thus likely that an informative molecule of the same abundance in two clinical samples may be detected at different abundances simply due to the presence of different amounts of noninformative competing molecules. Thus, the relative amplitudes of peaks in mass spectrometric spectra should be considered as "semi-quantitative" at best. Also, as mentioned earlier, the competitive nature of the binding will likely exclude low-abundance molecules due to preferential binding of high-abundance molecules with similar physicochemical properties.
These issues have not been adequately addressed in any of the published papers. Useful experiments could include spiking of known molecules in serum that is devoid of them. For example, spiking female serum with PSA or other molecules that have been tagged with stable isotopes may help to answer these questions. Similar experiments could delineate the detection limit of the methodology as it applies to SELFI-TOF procedures. Spiking with synthetic peptides would be another option.
Ideally, this method could work quantitatively if the surfaces used for molecule immobilization are either specific for certain proteins (e.g. antibodies or other binders) or they have enough capacity to quantitatively bind all the proteins applied to the sample.
It is also important to address the issue of ionization efficiency. Would the same concentration of an informative molecule on the protein chip produce a peak of the same amplitude if it is surrounded by variable amounts of irrelevant proteins that are also ionized during laser desorption? One would expect that the ionization will likely be affected by the presence of other molecules in the mixture, further contributing to the qualitative nature of the measurement.
As further stressed by Aebersold and Mann (17) in both MALDI and ESI-MS ionization, the relationship between the amount of analyte present and measured signal intensity is complex and incompletely understood. Mass spectrometers are therefore inherently poor quantitative devices. Furthermore, the ion current of a peptide is dependent on a multitude of variables that are difficult to control, and this measure is not a good indication of peptide abundance. To conclude, mass spectrometric analysis is not quantitative at present.
Regarding sensitivity of mass spectrometry, this is difficult to define because sensitivity is heavily dependent on the machine used (and these are rapidly changing over time) as well as the actual procedures performed before sample introduction into the instrument. For example, some procedures include extensive purification and preconcentration of samples, while in others (including SELDI-TOF analysis) the samples are minimally treated. Nevertheless, in a recent study of global analysis of protein expression in yeast, Ghaememaghami et al. (57) compared mass spectrometry and protein tagging methodologies combined with either Western blotting or green fluorescent protein microscopy. They found that tagging technologies and fluorescence microscopy were able to detect a total of 4,517 proteins with more than 90% overlap. In contrast, a recent study using mass spectrometry and isotope labeling succeeded in quantitatively monitoring changes in the abundance of only 688 yeast proteins (58). The authors concluded that mass spectrometry is capable of detecting abundances of proteins with >50,000 molecules per cell but it was not sensitive in detecting proteins with abundances of insensitivity of mass spectrometry, in comparison to Western blotting and fluorescence microscopy, combined with the fact that low-abundance proteins may not bind to the biochip, will make the detection of very-low-abundance molecules in serum by this technology highly unlikely. One should keep in mind that ELISA methodologies, usually used to quantify tumor markers in the circulation, are even more sensitive than Western blotting techniques, allowing direct measurement of analytes at levels as low as 10-12–10-13 mol/liter.
Another methodological artifact that should be kept in mind in SELDI-TOF experiments is the identification of discriminatory peaks (peptides, proteins, or protein fragments) that have originated ex vivo. Marshall et al. have recently shown that when plasma was left sitting at room temperature for 4 or 8 h, the MALDI-TOF spectra, as recorded by a SELDI-TOF instrument, changed significantly, suggesting that many peptides were generated by proteolytic digestion ex vivo (59). The authors attributed this peptide generation to action of specific (serine) proteases, because they could block this effect with serine protease inhibitors. These authors further speculated that the concentration of proteins released into the blood directly from the damaged cells or the changes in important regulatory factors associated with disease are likely to be far too small to be directly detected by MALDI-TOF.
In virtually every SELDI-TOF experiment published so far, a fraction of the clinical samples are used as a "training set" to derive the interpretation algorithm and the remaining samples used as a "test set." As correctly pointed out by Qu et al. (41), one of the concerns in the construction and use of learning algorithms is the possibility of overfitting the data. It is not known how robust these algorithms will be when used at different times, or on different sets of clinical samples. One example of demonstrating this possible problem was published by Rogers et al. (60). The sensitivity/specificity of a test for discriminating renal cell carcinoma from controls by SELDI-TOF was initially 98–100%. However, when the same procedure was used 10 months later in a new set of patients, the sensitivity dropped to 41%. The authors speculated that this dramatic loss of performance was likely due to sample stability, laser performance, or chip variability. It is thus important to show that algorithms, initially derived with training samples, can still work on different sets of samples and at different times.
It is currently suggested by many authors that m/z ratios obtained by SELDI-TOF mass spectrometry should be discarded as noise due to matrix effects (42). However, two of the five discriminatory peaks obtained by Petricoin et al. in their ovarian cancer diagnostic protocol with SELDI-TOF had m/z ratios of 534 and 989 (30). In a recent reanalysis of the original raw proteomic data on ovarian cancer, Sorace and Zhan identified peaks that contributed decisively to the discrimination between normal and cancer patients but did not make biological sense (i.e. a peak at an m/z ratio of 2.79) (61). These authors raised the possibility for a significant nonbiologic experimental bias between cancer and control groups, casting questions on the validity of the discriminatory peaks with m/z ratios >2,000. Essentially, the same conclusions were reached by Baggerly et al. who have also shown that "noise" peaks can achieve perfect classification of normals and cancer patients (62). It is thus mandatory that algorithms used to interpret mass spectrometric data in SELDI-TOF experiments should be carefully reviewed to avoid false conclusions. Indeed, it will be desirable for those working in the field to validate and compare their algorithms and examine if they can come-up with the same discriminatory peaks on the same group of data.
A rather surprising observation relates to two papers on prostate cancer published by the same group (40, 41) (Table II). The same patients were analyzed, and one set of data was generated; this was then examined by two different bioinformatic methods. Surprisingly, the two bioinformatic tools identified different discriminatory peaks. Only two peaks were the same among the nine identified in the first paper and among the twelve identified in the second. Also, one peak that was originally reported in the first study to discriminate between cancer and non-cancer patients was reported to discriminate between healthy controls and patients with benign prostatic hyperplasia in the second study.
In conclusion, it seems that the bioinformatic tools for analyzing SELDI-TOF data need to be carefully validated to avoid artifactual findings and overfitting. Moreover, the reasons for discriminating patients and controls by using peaks within the noise should be further investigated.
Recently, Liotta and colleagues pointed to the possibility that blood contains a vast amount of as yet unutilized and/or unidentified peptides that may have potential as diagnostic biomarkers (63). As indicated earlier, they believe that the tumor-host microenvironment should generate unique signatures in the blood microenvironment. A proposal was then made that the low-molecular-mass region of the blood proteome, which is a mixture of small intact proteins plus fragments of large proteins, represents all classes of proteins and is a treasure trove of diagnostic information largely ignored until now. Because small peptides can be effectively cleared by the kidney, the authors speculate that many of these low-molecular-mass proteins are bound to abundant serum proteins like albumin. This hypothesis needs to be tested experimentally. Experiments to validate this hypothesis should be straightforward. For example, many of these peptides can be characterized by using mass spectrometry. Then, these peptides, tagged with a stable isotope, can be used in recovery experiments to examine if they indeed bind to carrier proteins and what is their lifetime in the circulation. Furthermore, such experiments will reveal their abundance in the circulation and their origin (e.g. which are the parent proteins). Only when these experiments are done will the proposal of using these peptides for diagnostics gain more credence.
Liotta and colleagues are now substituting the original Ciphergen instrumentation with more sophisticated mass spectrometers of higher resolution. While these instruments do provide improved mass accuracy determination and more complicated spectra, they will not solve the problems outlined in this review regarding diagnostics, because the sample preparation procedures are still based on Ciphergen protein chips. Nevertheless, the results of the published and of the newer methods will require careful external validation by different laboratories. Clinical trials are now underway to examine if these methods are robust enough and suitable for clinical use. Until the external validation data become available, the method should not be used for clinical care.
NORMAL VERSUS ABNORMAL SERUM PROTEOMIC PATTERNS
One explanation for the published data is that the differences in serum proteomic patterns between controls and patients are due to the presence of cancer. Another explanation would be that these differences are not due to the presence of cancer but to a variety of unknown confounding factors. Possible confounders include: sample collection, processing and storage, patient selection and individual habits (e.g. gender, age, ethnicity, exercise, menopausal status, nutritional preferences, drugs, non-cancer diseases, etc.), inappropriate statistical design and/or analysis methods, machine instability, and variable chip performance. The effects of most of these parameters on serum proteomic patterns have not been studied.
Usually, classical tumor markers are evaluated by clinicians by using numerical and easy-to-understand cutoff points (64). All studies using SELDI-TOF technology for diagnostics compare "disease patterns" to "normal patterns." In practice, a normal pattern needs to be generated and used as reference to which the patient pattern will be compared. But how could such a "normal pattern" be generated when the reference group and the testing group are likely to be heterogeneous for the factors described above? It is likely that this "normal pattern" will be influenced by numerous parameters, including diseases different from the one that is being diagnosed. Because SELDI-TOF is a qualitative technique, data interpretation by comparing patterns may prove to be a daunting task.
Where should we go from here? In Table V, I summarize some open questions related to this technology. These questions have been posted before (46). Further progress will depend on providing answers after careful experimentation. I would also make some suggestions related to future publications and experiments that need to be done with this technology.
It is true that all classical cancer biomarkers have major shortcomings that preclude their applications for population screening and early diagnosis. The highly promising data generated by SELFI-TOF prompted many to suggest that this technique could be used clinically before the end of this year. However, as indicated above, numerous questions need to be answered before the technology is accepted. There should also be no shortcuts in the validation process of this technology by independent laboratories and agencies. Otherwise, we are running the risk of harming patients who would be misdiagnosed and subjected to unnecessary, invasive, and probably dangerous confirmatory procedures.
As with other medical advances, the ultimate judge of this technology will be time. I sincerely wish that this method will not follow the route of a similar effort originated in the 1980s that suggested cancer diagnosis based on nuclear magnetic resonance profiling of serum samples (65–67).
Received, February 27, 2004
1 The abbreviations used are: GC/MS, gas chromatography/mass spectrometry; MALDI, matrix-assisted laser desorption/ionization; ESI, electrospray ionization; MS/MS, tandem mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; SELDI-TOF, surface-enhanced laser desorption/ionization time-of-flight; PSA, prostate-specific antigen; ELISA, enzyme-linked immunosorbent assay; PSMA, prostate-specific membrane antigen.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
To whom correspondence should be addressed: Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada. Tel.: 416-586-8443; Fax: 416-586-8628; E-mail: [email protected]
TABLE I Some established cancer biomarkers
a All of these markers are used as aids in diagnosis, prognosis, and monitoring of therapy; steroid hormone receptors are used for predicting therapeutic response to antiestrogens.
b All markers measured in serum except steroid hormone receptors, which are measured in cancer tissues.
TABLE II Comparison of five reports for prostate cancer diagnosis based on SELDI-TOF technologya
a This table is modified and expanded from Refs. 46 and 68.
bm/z ratios were rounded to whole numbers for simplicity. m/z ratios in bold represent those identified by Adam et al. (40) and Qu et al. (41) for differentiating cancer from non-cancer patients. The underlined m/z ratio represents a peak identified by Adam et al. (40) for differentiating cancer from non-cancer patients and by Qu et al. (41) for differentiating healthy individuals from patients with benign prostatic hyperplasia.
c BPH, benign prostatic hyperplasia.
am/z ratios were rounded to whole numbers for simplicity.
b Somewhat different values were obtained for different sets of samples; see original papers for more details.
c Strong anion-exchange chip.
d Differentiates healthy from neoplastic (benign and malignant) disease. In this paper, the molecular masses of the markers are given instead of m/z ratios. All values were reported with one decimal point accuracy.
e Differentiates healthy and benign disease from malignant disease.
f In Ref. 30, there is a limit to the molecular mass or m/z ratio monitored (up to 20,000).
TABLE IV Concentration of some abundant proteins, putative new cancer biomarkers identified by SELDI-TOF, and classical cancer biomarkers in seruma
a This table is modified and expanded from Ref. 68.
b Apolipoprotein A1 is produced in liver and intestine; all other proteins are mainly produced in the liver (38).
TABLE V Some open questions related to diagnostic SELDI-TOF technologya
a This table is reproduced from Ref. 46 with permission from the copyright owners.