TECHNICAL CAVEATS AND METHODOLOGICAL DETAILS
It is important to understand how this method is used in order to identify possible deficiencies. Routinely, 1–3 µl of serum, either diluted or undiluted, is added to the activated surface of the protein chip and incubated. The chip is then washed, air-dried, and treated with an ultraviolet-absorbing agent (e.g. sinapinic acid, also known as "matrix") and then dried. The chip is then inserted into the mass spectrometer for SELDI-TOF mass spectrometric analysis. A critical question here is if the protein chip has the capacity to bind quantitatively all proteins present in the sample. Clearly, the answer is no. Then, what binds to the chip will depend on the total protein of the sample, the abundance of various competing proteins for the solid phase, and the properties of the chip (such as hydrophobicity, ion-exchange capacity, metal binding, etc.). Without knowing the abundance of competing proteins, and given the limited capacity of the matrix, what is finally retained on the surface may be quite variable between different clinical samples. It is thus likely that an informative molecule of the same abundance in two clinical samples may be detected at different abundances simply due to the presence of different amounts of noninformative competing molecules. Thus, the relative amplitudes of peaks in mass spectrometric spectra should be considered as "semi-quantitative" at best. Also, as mentioned earlier, the competitive nature of the binding will likely exclude low-abundance molecules due to preferential binding of high-abundance molecules with similar physicochemical properties.
These issues have not been adequately addressed in any of the published papers. Useful experiments could include spiking of known molecules in serum that is devoid of them. For example, spiking female serum with PSA or other molecules that have been tagged with stable isotopes may help to answer these questions. Similar experiments could delineate the detection limit of the methodology as it applies to SELFI-TOF procedures. Spiking with synthetic peptides would be another option.
Ideally, this method could work quantitatively if the surfaces used for molecule immobilization are either specific for certain proteins (e.g. antibodies or other binders) or they have enough capacity to quantitatively bind all the proteins applied to the sample.
It is also important to address the issue of ionization efficiency. Would the same concentration of an informative molecule on the protein chip produce a peak of the same amplitude if it is surrounded by variable amounts of irrelevant proteins that are also ionized during laser desorption? One would expect that the ionization will likely be affected by the presence of other molecules in the mixture, further contributing to the qualitative nature of the measurement.
As further stressed by Aebersold and Mann (17) in both MALDI and ESI-MS ionization, the relationship between the amount of analyte present and measured signal intensity is complex and incompletely understood. Mass spectrometers are therefore inherently poor quantitative devices. Furthermore, the ion current of a peptide is dependent on a multitude of variables that are difficult to control, and this measure is not a good indication of peptide abundance. To conclude, mass spectrometric analysis is not quantitative at present.
Regarding sensitivity of mass spectrometry, this is difficult to define because sensitivity is heavily dependent on the machine used (and these are rapidly changing over time) as well as the actual procedures performed before sample introduction into the instrument. For example, some procedures include extensive purification and preconcentration of samples, while in others (including SELDI-TOF analysis) the samples are minimally treated. Nevertheless, in a recent study of global analysis of protein expression in yeast, Ghaememaghami et al. (57) compared mass spectrometry and protein tagging methodologies combined with either Western blotting or green fluorescent protein microscopy. They found that tagging technologies and fluorescence microscopy were able to detect a total of 4,517 proteins with more than 90% overlap. In contrast, a recent study using mass spectrometry and isotope labeling succeeded in quantitatively monitoring changes in the abundance of only 688 yeast proteins (58). The authors concluded that mass spectrometry is capable of detecting abundances of proteins with >50,000 molecules per cell but it was not sensitive in detecting proteins with abundances of insensitivity of mass spectrometry, in comparison to Western blotting and fluorescence microscopy, combined with the fact that low-abundance proteins may not bind to the biochip, will make the detection of very-low-abundance molecules in serum by this technology highly unlikely. One should keep in mind that ELISA methodologies, usually used to quantify tumor markers in the circulation, are even more sensitive than Western blotting techniques, allowing direct measurement of analytes at levels as low as 10-12–10-13 mol/liter.
Another methodological artifact that should be kept in mind in SELDI-TOF experiments is the identification of discriminatory peaks (peptides, proteins, or protein fragments) that have originated ex vivo. Marshall et al. have recently shown that when plasma was left sitting at room temperature for 4 or 8 h, the MALDI-TOF spectra, as recorded by a SELDI-TOF instrument, changed significantly, suggesting that many peptides were generated by proteolytic digestion ex vivo (59). The authors attributed this peptide generation to action of specific (serine) proteases, because they could block this effect with serine protease inhibitors. These authors further speculated that the concentration of proteins released into the blood directly from the damaged cells or the changes in important regulatory factors associated with disease are likely to be far too small to be directly detected by MALDI-TOF.
In virtually every SELDI-TOF experiment published so far, a fraction of the clinical samples are used as a "training set" to derive the interpretation algorithm and the remaining samples used as a "test set." As correctly pointed out by Qu et al. (41), one of the concerns in the construction and use of learning algorithms is the possibility of overfitting the data. It is not known how robust these algorithms will be when used at different times, or on different sets of clinical samples. One example of demonstrating this possible problem was published by Rogers et al. (60). The sensitivity/specificity of a test for discriminating renal cell carcinoma from controls by SELDI-TOF was initially 98–100%. However, when the same procedure was used 10 months later in a new set of patients, the sensitivity dropped to 41%. The authors speculated that this dramatic loss of performance was likely due to sample stability, laser performance, or chip variability. It is thus important to show that algorithms, initially derived with training samples, can still work on different sets of samples and at different times.
It is currently suggested by many authors that m/z ratios obtained by SELDI-TOF mass spectrometry should be discarded as noise due to matrix effects (42). However, two of the five discriminatory peaks obtained by Petricoin et al. in their ovarian cancer diagnostic protocol with SELDI-TOF had m/z ratios of 534 and 989 (30). In a recent reanalysis of the original raw proteomic data on ovarian cancer, Sorace and Zhan identified peaks that contributed decisively to the discrimination between normal and cancer patients but did not make biological sense (i.e. a peak at an m/z ratio of 2.79) (61). These authors raised the possibility for a significant nonbiologic experimental bias between cancer and control groups, casting questions on the validity of the discriminatory peaks with m/z ratios >2,000. Essentially, the same conclusions were reached by Baggerly et al. who have also shown that "noise" peaks can achieve perfect classification of normals and cancer patients (62). It is thus mandatory that algorithms used to interpret mass spectrometric data in SELDI-TOF experiments should be carefully reviewed to avoid false conclusions. Indeed, it will be desirable for those working in the field to validate and compare their algorithms and examine if they can come-up with the same discriminatory peaks on the same group of data.
A rather surprising observation relates to two papers on prostate cancer published by the same group (40, 41) (Table II). The same patients were analyzed, and one set of data was generated; this was then examined by two different bioinformatic methods. Surprisingly, the two bioinformatic tools identified different discriminatory peaks. Only two peaks were the same among the nine identified in the first paper and among the twelve identified in the second. Also, one peak that was originally reported in the first study to discriminate between cancer and non-cancer patients was reported to discriminate between healthy controls and patients with benign prostatic hyperplasia in the second study.
In conclusion, it seems that the bioinformatic tools for analyzing SELDI-TOF data need to be carefully validated to avoid artifactual findings and overfitting. Moreover, the reasons for discriminating patients and controls by using peaks within the noise should be further investigated.