
Figure 1
Flowchart for the comparative analysis of normalization procedures.
Arrows in the chart show the flow of the data sets (blue: data set with
replicate samples, green: randomized data set, red: Bcell data set).
Figure 2
Comparison of Spearman rank correlation between arrays. Each box plot
represents a distribution of 45 points of correlation coefficients in (a) replicate data set, and (b) randomized data set. RMA and GCRMA are both significantly deviate from zero in (b), with Pvalues 3 x 10^{–17} and 0 (below MATLAB computational precision), respectively.
Figure 3
Histogram of the correlation coefficients between gene expression
profiles in the data sets produced by four different normalization
procedures. Xaxis corresponds to the Spearman correlation coefficient of 20 equalsize bins and yaxis corresponds to the count of each bin as a fraction of the total number of all possible pairs.
Figure 4
Fitting of the networks connectivity to a powerlaw distribution.
Figure 5
Fraction of the highly correlated gene pairs sharing the same GO
biological process. Gene pairs are ranked by mutual information.
Figure 6
Likelihood ratio of PPI for various ranges of the genepair correlation.
Figure 7
A hypothetical case explaining the cause of spurious correlation in GCRMAnormalized data set. (A) Intensity profiles, and (B)
intensity ranks, for three probes before (left) and after (right) GSB
adjustment. Before GSB adjustment, probe 1 and 2 have the lowest
intensities, m = 1, and the lowest ranks in the data set. If
probe 1 and 2 were adjusted for the same value due to their similarity
in probe affinity, and probe 3 was adjusted for a different value such
that the intensity profile crosses over the other two profiles, the
expression ranks of p1 and p2 change over the samples. Pairwise rank
correlation between p1 and p2 is then tremendously increased. The
effect of probe 3 is overly simplified in this hypothetical case and
the actual data should contain a combinatorial effect of many other
possible probes in the array.
Figure 8
Comparison of the GCRMA default (def) normalization procedure, GCRMA alternative (alt) implementation and MAS5 in terms of (A) fitness of network connectivity to a powerlaw distribution, (B) fraction of gene pairs sharing a common GO biological process annotation and (C) likelihood ratio of PPI.
