|
Figure 1
Flowchart for the comparative analysis of normalization procedures.
Arrows in the chart show the flow of the data sets (blue: data set with
replicate samples, green: randomized data set, red: B-cell data set).
(Click image to enlarge)
|
|
Figure 2
Comparison of Spearman rank correlation between arrays. Each box plot
represents a distribution of 45 points of correlation coefficients in (a) replicate data set, and (b) randomized data set. RMA and GCRMA are both significantly deviate from zero in (b), with P-values 3 x 10–17 and 0 (below MATLAB computational precision), respectively.
(Click image to enlarge)
|
|
Figure 3
Histogram of the correlation coefficients between gene expression
profiles in the data sets produced by four different normalization
procedures. X-axis corresponds to the Spearman correlation coefficient of 20 equal-size bins and y-axis corresponds to the count of each bin as a fraction of the total number of all possible pairs.
(Click image to enlarge)
|
|
Figure 4
Fitting of the networks connectivity to a power-law distribution.
(Click image to enlarge)
|
|
Figure 5
Fraction of the highly correlated gene pairs sharing the same GO
biological process. Gene pairs are ranked by mutual information.
(Click image to enlarge)
|
|
Figure 6
Likelihood ratio of PPI for various ranges of the gene-pair correlation.
(Click image to enlarge)
|
|
Figure 7
A hypothetical case explaining the cause of spurious correlation in GCRMA-normalized data set. (A) Intensity profiles, and (B)
intensity ranks, for three probes before (left) and after (right) GSB
adjustment. Before GSB adjustment, probe 1 and 2 have the lowest
intensities, m = 1, and the lowest ranks in the data set. If
probe 1 and 2 were adjusted for the same value due to their similarity
in probe affinity, and probe 3 was adjusted for a different value such
that the intensity profile crosses over the other two profiles, the
expression ranks of p1 and p2 change over the samples. Pairwise rank
correlation between p1 and p2 is then tremendously increased. The
effect of probe 3 is overly simplified in this hypothetical case and
the actual data should contain a combinatorial effect of many other
possible probes in the array.
(Click image to enlarge)
|
|
Figure 8
Comparison of the GCRMA default (def) normalization procedure, GCRMA alternative (alt) implementation and MAS5 in terms of (A) fitness of network connectivity to a power-law distribution, (B) fraction of gene pairs sharing a common GO biological process annotation and (C) likelihood ratio of PPI.
(Click image to enlarge)
|