Affymetrix Genechip® arrays are currently among the mostwidely used high-throughput technologies for the genome-widemeasurement of expression profiles. To minimize mis- and cross-hybridizationproblems, this technology includes both perfect match (PM) andmismatch (MM) probe pairs as well as multiple probes per gene(Lipshutz et al., 1999). As a result, significant preprocessingis required before an absolute expression level for a specificgene may be accurately assessed. Such data preprocessing steps—whichcombine multiple probe signals into a single absolute call—areknown as normalization procedures. They usually involve threesteps: (a) background adjustment, (b) normalization and (c)summarization (Gautier et al., 2004). Various methods have beendevised for each of the three steps and thus a great numberof possible combinations exist, facing the microarray user communitywith a complex and often daunting set of choices. We summarizesome of the commonly used procedures in Table 1.
As more and more preprocessing methods become available, itis increasingly important to rigorously and systematically benchmarktheir performance. Cope and colleagues (Cope et al., 2004) developeda graphical tool to evaluate normalization procedures that benefitsusers in identifying the best method in their study. The benchmarkingsystem took advantage of dilution and spike-in experimentalprocedures, yielding materials where the actual concentrationsof some mRNA were known a priori. The performance of a normalizationmethod would then be ranked based on the overall error estimatein the prediction of the concentration of these mRNAs (Bolstadet al., 2003; Liu et al., 2005). A different evaluation frameworkwas recently proposed, which is based on the analysis of thecorrelation between the expression levels of genes in replicatesamples as well as the correlation among same-operon genes inbacteria (Harr and Schlotterer, 2006). Correlation-based analysiswas also investigated by varying the normalization methods ofRMA procedure, in order to provide a quantitative assessmentof their effects on gene–gene correlation structure (Qiuet al., 2005). While the former type of comparative approachidentifies method that best differentiates concentration levelsof RNA transcript, the latter favors methods that can optimallyidentify an expected correlation between gene pairs. However,none of these comparative frameworks studies whether the normalizationprocedure may introduce correlation artifacts for gene pairsthat are not expected to be co-expressed. As a result, theyalso fail to address the suitability of these methods to thereconstruction of cellular networks from expression profiledata, including the inference of networks topological propertiesand gene functional relationships based on co-expression measurements(Basso et al., 2005; Butte and Kohane, 2000; Hughes et al.,2000). In these methods, artifacts in the correlation measurecan dramatically increase the number of inferred false-positiveinteractions.
In this article, we summarize the effects of various normalizationprocedures on the accurate estimate of gene expression profiles,both in terms of accuracy and of artifact minimization. Furthermore,we study their efficacy of protein–protein interaction(PPI) inference in a reverse engineering context. The flowchartshown in Figure 1 illustrates the comparative methodology adoptedin this article. In particular, we compare the Spearman rankcorrelation between gene expression profile pairs from replicatesamples as well as from samples with randomly permuted probevalues. This allows to assess both true and artifact correlations.One unique feature of the analysis is that we permuted the rawintensity values stored in the Affymetrix CEL files to estimatedeviations from the null hypothesis, where the expression profiledata is completely uncorrelated before normalization. We alsoutilize a data set that consists of 254 expression profilesfrom normal and tumor related human B-cells to investigate thecorrelation structure among the gene expression profiles, aswell as the global gene network connectivity. This data sethas been used extensively in the literature (Margolin et al.,2006; Wang et al., 2006) and, as a result, it provides a uniqueopportunity to evaluate correct and incorrect inferences ina reverse-engineering settings.
Gene co-expression has been successfully used to infer functionalrelationship (Roberts et al., 2000; Stuart et al., 2003). Wethus tested, for each normalization procedure, the hypothesisthat highly co-expressed gene pairs are more likely to participatein the same biological pathways than those uncorrelated, byusing biological process annotations from Gene Ontology (GO)(Ashburner et al., 2000). To further address the issue of whetherhigher correlation reflects a higher probability of physicalinteraction, we exploit the approach as in (Jansen et al., 2003)to compute a likelihood ratio for PPIs for gene pairs showingvarious degrees of correlation. The method relies on the well-justifiedhypothesis that proteins involved in a complex tend to be encodedby co-regulated genes, because it is energetically advantageousfor the cell to synthesize them in stoichiometric balance (Geet al., 2001). Thus, an increasing PPI likelihood ratio shouldreflect an increasing probability of a bona fide physical interactionand correlation artifacts should dilute that relationship. Theproposed evaluation strategies finally assess how well thesenormalization procedures fit in the context of algorithms thatrely on statistical dependencies among gene expression profiles,such as the ones used to reverse engineer gene networks.