2.1 Microarray data
Generations of microarray replicates are described in detail
in Tu et al
). In brief, mRNA from the Ramos human Burkitt's
lymphoma cell line is used for the experiments. The purified
sample is separated equally into several subgroups and each
subgroup independently goes through the preparation steps. The
final target sample is then divided into several samples and
independently hybridized to 10 different Affymetrix HGU95A arrays.
The data set used for investigating gene co-expression consists
of gene expression profiles from 254 naturally occurring phenotypic
variations of human B-cell. It represents a wide variety of
homogenous B-cell phenotypes derived from normal and tumor related
populations. The microarray experiments are described in Bassoet al
) and the CEL
files are available on the Gene Expression
Omnibus website (series accession number: GSE2350
2.2 Permutation of CEL file
Raw signal intensities for each probe pairs were randomly permutedto create uninformative CEL files. We retained the relativeposition between PM and MM for every probe pairs, in order toensure fair comparison between normalization procedures thatutilize MM information to correct for non-specific binding andthose that rely entirely on PM intensities. However, shufflingthe probe pairs has been sufficient to destroy real signal ofthe probe sets as they now consist of random probes values.This data is crucial in our comparative study as the null setshould not contain any information.
2.3 Normalization procedures
We compared the four normalization procedures MAS5, RMA, GCRMAand Li–Wong, and all the normalization were implementedusing software packages available from Bioconductor (http://www.bioconductor.org).We used the default parameters from the software packages unlessotherwise specified. The term ‘Li–Wong’ refersto the procedure that normalizes arrays using invariant setof genes and then fits a parametric model to the probe set data,as described in Li and Wong (2001).
2.4 Evaluation of biological function relationship
GO annotations of the genes were extracted from Affymetrix HGU95annotation file. There are 10 369 terms for biological processin total and 61 general terms were removed. We are interestedonly in specific terms that are shared by <5% of the genesin the microarray. A gene pair sharing a common GO term is thendeemed functionally related.
2.5 Likelihood ratio of protein–protein interaction
We assembled a set of gold-standard positive interactions bytaking the union of interaction data from the Human ProteinReference Database (HPRD), the Biomolecular Interaction NetworkDatabase (BIND), the Database of Interacting Proteins (DIP)and IntAct (Bader et al., 2003; Hermjakob et al., 2004; Periet al., 2003; Xenarios et al., 2002). The resulting gold-standardpositive set consists of 21 509 unique PPIs (heterodimers only)that could possibly pair up among genes in the Human GenomeU95 array. A negative gold-standard is harder to define, butwe took the common approach by taking the lists of protein pairsthat are unlikely to interact given their cellular localization.The assembled negative set contains 6 101 360 pairs of proteinsencoded by genes represented on the U95 array. The likelihoodratio is computed as the fraction of conditional probabilitiesfor a set of protein pairs, here the top predicted gene pairsranked by statistical dependency between expression profiles,given the gold-standard positive (pos) and negative (neg) sets:
LR = P (coexpressed pair|pos) / P (coexpressed pair|neg)