Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks


Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks

Wei Keat Lim 1,2, Kai Wang 1,2, Celine Lefebvre 2 and Andrea Califano 1,2,*

1Department of Biomedical Informatics, Columbia University, 622 West 168th Street, Vanderbilt Clinic 5th Floor and 2Center for Computational Biology and Bioinformatics, Columbia University, 1130 Saint Nicholas Avenue, New York, NY 10032, USA

An open access article: Bioinformatics 2007 23(13):i282-i288.



Motivation: An increasingly common application of gene expressionprofile data is the reverse engineering of cellular networks.However, common procedures to normalize expression profilesgenerated using the Affymetrix GeneChips technology were originallydeveloped for a rather different purpose, namely the accuratemeasure of differential gene expression between two or morephenotypes. As a result, current evaluation strategies lackcomprehensive metrics to assess the suitability of availablenormalization procedures for reverse engineering and, in general,for measuring correlation between the expression profiles ofa gene pair.

Results: We benchmark four commonly used normalization procedures(MAS5, RMA, GCRMA and Li-Wong) in the context of establishedalgorithms for the reverse engineering of protein–proteinand protein–DNA interactions. Replicate sample, randomizedand human B-cell data sets are used as an input. Surprisingly,our study suggests that MAS5 provides the most faithful cellularnetwork reconstruction. Furthermore, we identify a crucial stepin GCRMA responsible for introducing severe artifacts in thedata leading to a systematic overestimate of pairwise correlation.This has key implications not only for reverse engineering butalso for other methods, such as hierarchical clustering, relyingon accurate measurements of pairwise expression profile correlation.We propose an alternative implementation to eliminate such sideeffect.



Affymetrix Genechip® arrays are currently among the mostwidely used high-throughput technologies for the genome-widemeasurement of expression profiles. To minimize mis- and cross-hybridizationproblems, this technology includes both perfect match (PM) andmismatch (MM) probe pairs as well as multiple probes per gene(Lipshutz et al., 1999). As a result, significant preprocessingis required before an absolute expression level for a specificgene may be accurately assessed. Such data preprocessing steps—whichcombine multiple probe signals into a single absolute call—areknown as normalization procedures. They usually involve threesteps: (a) background adjustment, (b) normalization and (c)summarization (Gautier et al., 2004). Various methods have beendevised for each of the three steps and thus a great numberof possible combinations exist, facing the microarray user communitywith a complex and often daunting set of choices. We summarizesome of the commonly used procedures in Table 1.

As more and more preprocessing methods become available, itis increasingly important to rigorously and systematically benchmarktheir performance. Cope and colleagues (Cope et al., 2004) developeda graphical tool to evaluate normalization procedures that benefitsusers in identifying the best method in their study. The benchmarkingsystem took advantage of dilution and spike-in experimentalprocedures, yielding materials where the actual concentrationsof some mRNA were known a priori. The performance of a normalizationmethod would then be ranked based on the overall error estimatein the prediction of the concentration of these mRNAs (Bolstadet al., 2003; Liu et al., 2005). A different evaluation frameworkwas recently proposed, which is based on the analysis of thecorrelation between the expression levels of genes in replicatesamples as well as the correlation among same-operon genes inbacteria (Harr and Schlotterer, 2006). Correlation-based analysiswas also investigated by varying the normalization methods ofRMA procedure, in order to provide a quantitative assessmentof their effects on gene–gene correlation structure (Qiuet al., 2005). While the former type of comparative approachidentifies method that best differentiates concentration levelsof RNA transcript, the latter favors methods that can optimallyidentify an expected correlation between gene pairs. However,none of these comparative frameworks studies whether the normalizationprocedure may introduce correlation artifacts for gene pairsthat are not expected to be co-expressed. As a result, theyalso fail to address the suitability of these methods to thereconstruction of cellular networks from expression profiledata, including the inference of networks topological propertiesand gene functional relationships based on co-expression measurements(Basso et al., 2005; Butte and Kohane, 2000; Hughes et al.,2000). In these methods, artifacts in the correlation measurecan dramatically increase the number of inferred false-positiveinteractions. In this article, we summarize the effects of various normalizationprocedures on the accurate estimate of gene expression profiles,both in terms of accuracy and of artifact minimization. Furthermore,we study their efficacy of protein–protein interaction(PPI) inference in a reverse engineering context. The flowchartshown in Figure 1 illustrates the comparative methodology adoptedin this article. In particular, we compare the Spearman rankcorrelation between gene expression profile pairs from replicatesamples as well as from samples with randomly permuted probevalues. This allows to assess both true and artifact correlations.One unique feature of the analysis is that we permuted the rawintensity values stored in the Affymetrix CEL files to estimatedeviations from the null hypothesis, where the expression profiledata is completely uncorrelated before normalization. We alsoutilize a data set that consists of 254 expression profilesfrom normal and tumor related human B-cells to investigate thecorrelation structure among the gene expression profiles, aswell as the global gene network connectivity. This data sethas been used extensively in the literature (Margolin et al.,2006; Wang et al., 2006) and, as a result, it provides a uniqueopportunity to evaluate correct and incorrect inferences ina reverse-engineering settings.

Gene co-expression has been successfully used to infer functionalrelationship (Roberts et al., 2000; Stuart et al., 2003). Wethus tested, for each normalization procedure, the hypothesisthat highly co-expressed gene pairs are more likely to participatein the same biological pathways than those uncorrelated, byusing biological process annotations from Gene Ontology (GO)(Ashburner et al., 2000). To further address the issue of whetherhigher correlation reflects a higher probability of physicalinteraction, we exploit the approach as in (Jansen et al., 2003)to compute a likelihood ratio for PPIs for gene pairs showingvarious degrees of correlation. The method relies on the well-justifiedhypothesis that proteins involved in a complex tend to be encodedby co-regulated genes, because it is energetically advantageousfor the cell to synthesize them in stoichiometric balance (Geet al., 2001). Thus, an increasing PPI likelihood ratio shouldreflect an increasing probability of a bona fide physical interactionand correlation artifacts should dilute that relationship. Theproposed evaluation strategies finally assess how well thesenormalization procedures fit in the context of algorithms thatrely on statistical dependencies among gene expression profiles,such as the ones used to reverse engineer gene networks.  


2.1 Microarray data
Generations of microarray replicates are described in detailin Tu et al. (2002). In brief, mRNA from the Ramos human Burkitt'slymphoma cell line is used for the experiments. The purifiedsample is separated equally into several subgroups and eachsubgroup independently goes through the preparation steps. Thefinal target sample is then divided into several samples andindependently hybridized to 10 different Affymetrix HGU95A arrays.The data set used for investigating gene co-expression consistsof gene expression profiles from 254 naturally occurring phenotypicvariations of human B-cell. It represents a wide variety ofhomogenous B-cell phenotypes derived from normal and tumor relatedpopulations. The microarray experiments are described in Bassoet al. (2005) and the CEL files are available on the Gene ExpressionOmnibus website (series accession number: GSE2350 [NCBI GEO] ).

2.2 Permutation of CEL file
Raw signal intensities for each probe pairs were randomly permutedto create uninformative CEL files. We retained the relativeposition between PM and MM for every probe pairs, in order toensure fair comparison between normalization procedures thatutilize MM information to correct for non-specific binding andthose that rely entirely on PM intensities. However, shufflingthe probe pairs has been sufficient to destroy real signal ofthe probe sets as they now consist of random probes values.This data is crucial in our comparative study as the null setshould not contain any information.

2.3 Normalization procedures
We compared the four normalization procedures MAS5, RMA, GCRMAand Li–Wong, and all the normalization were implementedusing software packages available from Bioconductor ( used the default parameters from the software packages unlessotherwise specified. The term ‘Li–Wong’ refersto the procedure that normalizes arrays using invariant setof genes and then fits a parametric model to the probe set data,as described in Li and Wong (2001).

2.4 Evaluation of biological function relationship
GO annotations of the genes were extracted from Affymetrix HGU95annotation file. There are 10 369 terms for biological processin total and 61 general terms were removed. We are interestedonly in specific terms that are shared by <5% of the genesin the microarray. A gene pair sharing a common GO term is thendeemed functionally related.

2.5 Likelihood ratio of protein–protein interaction
We assembled a set of gold-standard positive interactions bytaking the union of interaction data from the Human ProteinReference Database (HPRD), the Biomolecular Interaction NetworkDatabase (BIND), the Database of Interacting Proteins (DIP)and IntAct (Bader et al., 2003; Hermjakob et al., 2004; Periet al., 2003; Xenarios et al., 2002). The resulting gold-standardpositive set consists of 21 509 unique PPIs (heterodimers only)that could possibly pair up among genes in the Human GenomeU95 array. A negative gold-standard is harder to define, butwe took the common approach by taking the lists of protein pairsthat are unlikely to interact given their cellular localization.The assembled negative set contains 6 101 360 pairs of proteinsencoded by genes represented on the U95 array. The likelihoodratio is computed as the fraction of conditional probabilitiesfor a set of protein pairs, here the top predicted gene pairsranked by statistical dependency between expression profiles,given the gold-standard positive (pos) and negative (neg) sets:


LR = P (coexpressed pair|pos) / P (coexpressed pair|neg)


A common approach used to evaluate a normalization procedureis to compare correlation coefficient between replicate samples.We compared four normalization procedures, MAS5, RMA, GCRMAand Li–Wong, on gene expression measurements of 10 replicatesamples as well as on their permuted data files. The randomizeddata set plays the role of a negative control (null-hypothesis)such that any significant correlation measured on the permuteddataset could be deemed an artifact of the specific normalizationprocedure. Figure 2a shows the comparison of between-sampleSpearman rank correlation among the four normalization procedures.While all four procedures achieve correlation >0.9, GCRMAseems to produce higher overall correlation measures than theother methods, while MAS5 appears to produce the lowest overallcorrelation measures. This may be incorrectly interpreted toimply that GCRMA normalization outperforms the other methods.However, Figure 2b provides a completely different interpretationfor these observations. It shows that both RMA and especiallyGCRMA produce highly significant correlation measurements evenwhen applied to the randomized set. The conclusion is that thehigher overall correlation after normalization with these twomethods is an artifact and will likely skew the results of reverse-engineeringmethods.

In several methods for the reverse engineering of cellular networks,physical interactions—including protein–proteinand protein–DNA interactions—are inferred from thestatistical dependencies between gene expression profiles (Bassoet al., 2005; Ge et al., 2001). A correct estimate of the correlationstructure in the gene expression profile data is thus one ofthe most crucial ingredients of a successful reverse-engineeringalgorithm. To estimate the impact of normalization procedureson the correlation structure, we preprocessed a set of 254 Affymetrixarrays using the four normalization procedures and then computedcorrelation between all pairs of probe sets. Figure 3 providesa global view of the correlation structure in these data sets.In particular, out of 77 millions possible probe set pairs,there are 5.2 millions (6.7%) expression pairs with correlationcoefficient, |{rho}|, >0.75 in GCRMA-normalized data set. In contrast,MAS5-normalized data set contains only 0.04% or ~33 000 pairsabove this cutoff value. This is an extraordinary differencewhich warrants further investigation, especially in light ofthe previous results from the analysis of a randomized set.Since most biological networks are known to be scale-free, orpossess a degree distribution that can be approximated by apower law (Barabasi and Oltvai, 2004), we varied threshold ofa relevance network (Butte and Kohane, 2000) and fit the globalnetwork connectivity to a power-law distribution. Pairwise mutualinformation (MI) of the network was estimated using a Gaussiankernel method (Margolin et al., 2006). Figure 4 compares R2-valueof the fitting in each of the four normalized data sets. Withthe exception of GCRMA, all other networks show a good fit ofscale-free distribution and the R2 reach a plateau above thresholdP-value of 1 x 10–10. While this is just a sanity checkrather than a fully quantitative result, it should be clearthat a normalization method resulting in a large deviation fromthe accepted model of topological connectivity in the cell shouldbe approached with caution.

We finally proceeded to assess whether higher correlation, aftera specific normalization procedure, would reflect a higher chanceof either functional or physical interaction between two genes.Two non-parametric gene-pair correlation measures were tested,including Spearman rank correlation and MI. Here we only reportresults for MI analysis, as both measurements produce consistentresults in all the tests we performed. MI was chosen as it arguablyprovides the best estimate of pairwise statistical dependencyin a non-linear setting. We first evaluated functional relationshipbetween genes by examining whether they share a common GO biologicalprocess annotation. Figure 5 compares the fraction of gene pairswith the same GO annotation in a cumulative equal-frequencyhistogram. MAS5 demonstrated the best result with ~48% of thetop 10 000 pairs having a common GO biological process term,followed by RMA and Li–Wong, while GCRMA produces a fractionthat is comparable to the background level. We then computedlikelihood ratios of PPIs for the top correlated gene pairsusing the gold-standard PPI interaction sets described in theMethods section. Although this is a rather naïve methodthat directly correlates physical interaction with gene co-expression,the results should still be able to provide a fair comparisonamong the four normalization procedures given that the sameassumption is made for all of them. Figure 6 shows the plotsof the PPI likelihood ratio as a function of various rangesof gene–gene correlations. Note that the discrepancy ofperformance between the curves implies that the MI ranks ofthe gene pairs are not consistent among the four normalizationprocedures. Gene pairs found to be highly correlated in onedata set may not be significant in another data set. Our resultsshow that MAS5-normalized data provide by far the best platformfor inferring PPIs. To our surprise, yet consistently with allother tests in this article, data normalized by GCRMA dramaticallyscrambled the ranks of correlation among gene pairs, i.e. highlycorrelated gene pairs are equally likely to be a positive PPIor a negative PPI. This is likely the consequence of correlationartifacts resulting in the introduction of a large number ofgene pairs that are not truly correlated among the top mostcorrelated ones.

The results of this analysis are both surprising and concerning.GCRMA has been a popular procedure used to convert raw microarraydata into gene expression profiles and it was shown to outperformother normalization procedures in detecting differentially expressedgenes (Wu et al., 2004). However, we observed the opposite resultwhen correlation artifacts are considered. This does not justaffect reverse-engineering methods, but any other method thatrelies on an accurate measure of gene-pair expression profilecorrelation, such as hierarchical clustering among many others.

It is rather obvious that the dramatic under-performance ofGCRMA is due to its background adjustment step, since (a) itis the only step where GCRMA and RMA differ and (b) RMA hasbeen performing much better than GCRMA in our study, albeitnot as well as MAS5. The background adjustment in GCRMA consistsof three sequential steps: (1) optical background correction,(2) probe intensity adjustment through non-specific binding(NSB) utilizing affinity information and optical noise-adjustedMM intensities and (3) probe intensity adjustment through gene-specificbinding (GSB), where NSB-adjusted PM intensities are furthercorrected for the effect of PM probe affinities. In step (2),the default GCRMA procedure implemented in the R statisticalpackage truncates PM intensities to a minimum value, m, if theNSB-adjusted intensity values are less than m, and in step (3),these truncated PM probes are further adjusted for GSB.

We suggest that the GSB adjustment of truncated values is asignificant flaw in the design of the GCRMA normalization procedureand that it is directly responsible for the difference in performancebetween RMA and GCRMA. From a theoretical perspective, oncea probe intensity is truncated at m, it should be deemed uninformativeand further adjustments should be avoided. More specifically,if any two probes with similar affinities are truncated in thesame subset of samples, GSB adjustment could introduce correlationartifacts between the two probes. Figure 7 demonstrates a simplifiedscenario, where two probes with similar affinity, and intensitiesboth truncated to m, could gain a high correlation after GSBadjustment. The GCRMA procedure applies quantile normalizationafter the background adjustment steps, where all probe intensitiesare essentially transformed to ranks within each sample. WithoutGSB-adjustment, these two probes will both rank the lowest inall samples as their intensities are truncated. However, ifGSB adjustment is applied, they may switch ranks with otheruntruncated probes in some samples after the adjustment, andsuch rank switches can be highly correlated between the twoprobes owing to their similar probe affinity. One possible solutionhere is to decrease the value of m, in order to reduce the truncatedregions as well as the number of the affected probes. However,in practice, we found that the most effective way to reducethis problem is to avoid further GSB adjustment altogether onthe probes with truncated intensities. Note that any algorithmused to compute non-parametric statistical dependencies betweenprobe-pairs should randomize the rank of equal-value entriessuch that any probe pair with a constant expression level acrosssamples should not contribute to the correlation.

To test our speculations, we reimplemented the GCRMA procedurewithout adjusting GSB for uninformative probes—i.e. probesthat are truncated to m after NSB adjustment. To ensure thelowest intensity rank of these probes, any other probes withGSB-adjusted value less than m were also truncated at m. Finally,an infinitesimal amount of uniformly distributed noise was addedto truncated probes to avoid rank-order correlation issues.The new implementation successfully removed the artificial correlationinduced in the default version and, as shown in Figure 8, performedmuch better than the original GCRMA and almost at par with MAS5.





The use of GCRMA and RMA normalization procedures for AffymetrixGeneChip® technology has received a remarkably broad adoptionin the community due to previous benchmarks demonstrating theirsuperiority with respect to other methods. However, while thesemethods perform well in the assessment of differential expressionanalysis, we found that they also introduce correlation artifactsin the data. This seriously undermines their utilization, atleast in their standard form, upstream of reverse engineeringalgorithms or any other method relying on the estimate of expressionprofile correlation. Thus, our results raise issues on the validityof many studies obtained on the basis of correlation measuresafter these normalization procedures were applied. Specificallywe suggest that the implementation of a specific step in GCRMA—theGSB adjustment of truncated values—introduces artificialcorrelation among the probesets. Unfortunately, according toour analysis, these artifacts are not dataset specific and cansurvive even after the use of additional probe sets postprocessingfilters such as those based on mean, SD and coefficient of variation.

Results were completely consistent across four classes of tests,including (a) a direct assessment of correlation artifacts fromreplicate and randomized samples, (b) an evaluation of the globaltopological properties of reverse engineered networks, (c) astudy of the functional clustering of correlated genes and (d)a study of the relationship between gene-pair expression profilecorrelation and membership in stable protein complexes. Theunequivocal result is that normalization with GCRMA substantiallyreduces the ability to distinguish between actual and incorrectfunctional and physical interactions. In particular, GCRMA islikely to introduce an extraordinary number of false positives,while MAS5 appears to perform optimally with respect to thesetests.

We conclude that the choice of normalization procedure stronglyaffects the correlation structure in the data. Thus, choosingthe right normalization procedure is a key step towards theinference of accurate cellular networks. Our comparative analysisfavors MAS5 in this context even though (or probably because)it infers fewer interactions but with the highest functionaland physical interaction enrichment.

Finally, we suggest that a specific correction to the defaultimplementation of GCRMA in the R package appears to substantiallyimprove its performance, making it competitive with that ofMAS5. With this correction, we believe that GCRMA can be properlyutilized in the context of reverse engineering gene networks.


We thank Drs R.Dalla-Favera, K.Basso and U.Klein for sharingthe B-cell gene expression profile dataset and Dr M.S.Carrofor an insightful discussion. This work was supported by theNational Cancer Institute (R01CA109755), the National Instituteof Allergy and Infectious Diseases (R01AI066116), and the NationalCenters for Biomedical Computing NIH Roadmap Initiative (U54CA121852).Conflict of Interest: none declared.


Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet, ( (2000) ) 25, : 25–29.[CrossRef][ISI][Medline].

Bader GD, et al. BIND: the biomolecular interaction network database. Nucleic Acids Res, ( (2003) ) 31, : 248–250.[Abstract/Free Full Text].

Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat. Rev. Genet, ( (2004) ) 5, : 101–113.[CrossRef][ISI][Medline].

Basso K, et al. Reverse engineering of regulatory networks in human B cells. Nat. Genet, ( (2005) ) 37, : 382–390.[CrossRef][ISI][Medline].

Bolstad BM, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, ( (2003) ) 19, : 185–193.[Abstract/Free Full Text].

Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput, ( (2000) ) 5, : 418–429..

Cope LM, et al. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics, ( (2004) ) 20, : 323–331.[Abstract/Free Full Text].

Gautier L, et al. Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, ( (2004) ) 20, : 307–315.[Abstract/Free Full Text].

Ge H, et al. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet, ( (2001) ) 29, : 482–486.[CrossRef][ISI][Medline].

Harr B, Schlotterer C. Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons. Nucleic Acids Res, ( (2006) ) 34, : e8.[Abstract/Free Full Text].

Hermjakob H, et al. IntAct: an open source molecular interaction database. Nucleic Acids Res, ( (2004) ) 32, : D452–455.[Abstract/Free Full Text].

Hubbell E, et al. Robust estimators for expression analysis. Bioinformatics, ( (2002) ) 18, : 1585–1592.[Abstract/Free Full Text].

Hughes TR, et al. Functional discovery via a compendium of expression profiles. Cell, ( (2000) ) 102, : 109–126.[CrossRef][ISI][Medline].

Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, ( (2003) ) 4, : 249–264.[Abstract].

Jansen R, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, ( (2003) ) 302, : 449–453.[Abstract/Free Full Text].

Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA, ( (2001) ) 98, : 31–36.[Abstract/Free Full Text].

Lipshutz RJ, et al. High density synthetic oligonucleotide arrays. Nat. Genet, ( (1999) ) 21, : 20–24.[CrossRef][ISI][Medline].

Liu X, et al. A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics, ( (2005) ) 21, : 3637–3644.[Abstract/Free Full Text].

Margolin AA, et al. Reverse engineering cellular networks. Nat. Protocols, ( (2006) ) 1, : 662–671.[CrossRef].

Peri S, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res, ( (2003) ) 13, : 2363–2371.[Abstract/Free Full Text].

Qiu X, et al. The effects of normalization on the correlation structure of microarray data. BMC Bioinformat, ( (2005) ) 6, : 120.[CrossRef].

Roberts CJ, et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science, ( (2000) ) 287, : 873–880.[Abstract/Free Full Text].

Stuart JM, et al. A gene-coexpression network for global discovery of conserved genetic modules. Science, ( (2003) ) 302, : 249–255.[Abstract/Free Full Text].

Tu Y, et al. Quantitative noise analysis for gene expression microarray experiments. Proc. Natl Acad. Sci. USA, ( (2002) ) 99, : 14031–14036.[Abstract/Free Full Text].

Wang K, et al. Genome-wide discovery of modulators of transcriptional interactions in human B lymphocytes. Lect. Notes Comput. Sci. (RECOMB), ( (2006) ) 3909, : 348–362..

Wu Z, et al. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc, ( (2004) ) 99, : 909–917.[CrossRef][ISI].

Xenarios I, et al. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res, ( (2002) ) 30, : 303–305.[Abstract/Free Full Text].

Table 1

Summary of four commonly used normalization procedures
Procedure Background correction Normalization Summarization Reference

MAS5 Ideal (full or partial) MM subtraction Constant Tukey biweight Hubbell et al., 2002
RMA Signal (exponential) and noise (normal) close-form transformation Quantile Median polish Irizarry et al., 2003
GCRMA Optical noise, probe affinity and MM adjustment Quantile Median polish Wu et al., 2004
Li–Wong None Invariant set Multiplicative model fitting Li and Wong, 2001


mcith_btm201f1.gif Figure 1  Flowchart for the comparative analysis of normalization procedures. Arrows in the chart show the flow of the data sets (blue: data set with replicate samples, green: randomized data set, red: B-cell data set).

(Click image to enlarge)

mcith_btm201f2.gif Figure 2  Comparison of Spearman rank correlation between arrays. Each box plot represents a distribution of 45 points of correlation coefficients in (a) replicate data set, and (b) randomized data set. RMA and GCRMA are both significantly deviate from zero in (b), with P-values 3 x 10–17 and 0 (below MATLAB computational precision), respectively.

(Click image to enlarge)

mcith_btm201f3.gif Figure 3 Histogram of the correlation coefficients between gene expression profiles in the data sets produced by four different normalization procedures. X-axis corresponds to the Spearman correlation coefficient of 20 equal-size bins and y-axis corresponds to the count of each bin as a fraction of the total number of all possible pairs.

(Click image to enlarge)

mcith_btm201f4.gif Figure 4 Fitting of the networks connectivity to a power-law distribution.

(Click image to enlarge)

mcith_btm201f5.gif Figure 5 Fraction of the highly correlated gene pairs sharing the same GO biological process. Gene pairs are ranked by mutual information.

(Click image to enlarge)

mcith_btm201f6.gif Figure 6 Likelihood ratio of PPI for various ranges of the gene-pair correlation.

(Click image to enlarge)

mcith_btm201f7.gif Figure 7 A hypothetical case explaining the cause of spurious correlation in GCRMA-normalized data set. (A) Intensity profiles, and (B) intensity ranks, for three probes before (left) and after (right) GSB adjustment. Before GSB adjustment, probe 1 and 2 have the lowest intensities, m = 1, and the lowest ranks in the data set. If probe 1 and 2 were adjusted for the same value due to their similarity in probe affinity, and probe 3 was adjusted for a different value such that the intensity profile crosses over the other two profiles, the expression ranks of p1 and p2 change over the samples. Pairwise rank correlation between p1 and p2 is then tremendously increased. The effect of probe 3 is overly simplified in this hypothetical case and the actual data should contain a combinatorial effect of many other possible probes in the array.

(Click image to enlarge)

mcith_btm201f8.gif Figure 8 Comparison of the GCRMA default (def) normalization procedure, GCRMA alternative (alt) implementation and MAS5 in terms of (A) fitness of network connectivity to a power-law distribution, (B) fraction of gene pairs sharing a common GO biological process annotation and (C) likelihood ratio of PPI.

(Click image to enlarge)