such as "Introduction", "Conclusion"..etc
2.2 Permutation of CEL file
Raw signal intensities for each probe pairs were randomly permutedto create uninformative CEL files. We retained the relativeposition between PM and MM for every probe pairs, in order toensure fair comparison between normalization procedures thatutilize MM information to correct for non-specific binding andthose that rely entirely on PM intensities. However, shufflingthe probe pairs has been sufficient to destroy real signal ofthe probe sets as they now consist of random probes values.This data is crucial in our comparative study as the null setshould not contain any information.
2.3 Normalization procedures
We compared the four normalization procedures MAS5, RMA, GCRMAand Li–Wong, and all the normalization were implementedusing software packages available from Bioconductor (http://www.bioconductor.org).We used the default parameters from the software packages unlessotherwise specified. The term ‘Li–Wong’ refersto the procedure that normalizes arrays using invariant setof genes and then fits a parametric model to the probe set data,as described in Li and Wong (2001).
2.4 Evaluation of biological function relationship
GO annotations of the genes were extracted from Affymetrix HGU95annotation file. There are 10 369 terms for biological processin total and 61 general terms were removed. We are interestedonly in specific terms that are shared by <5% of the genesin the microarray. A gene pair sharing a common GO term is thendeemed functionally related.
2.5 Likelihood ratio of protein–protein interaction
We assembled a set of gold-standard positive interactions bytaking the union of interaction data from the Human ProteinReference Database (HPRD), the Biomolecular Interaction NetworkDatabase (BIND), the Database of Interacting Proteins (DIP)and IntAct (Bader et al., 2003; Hermjakob et al., 2004; Periet al., 2003; Xenarios et al., 2002). The resulting gold-standardpositive set consists of 21 509 unique PPIs (heterodimers only)that could possibly pair up among genes in the Human GenomeU95 array. A negative gold-standard is harder to define, butwe took the common approach by taking the lists of protein pairsthat are unlikely to interact given their cellular localization.The assembled negative set contains 6 101 360 pairs of proteinsencoded by genes represented on the U95 array. The likelihoodratio is computed as the fraction of conditional probabilitiesfor a set of protein pairs, here the top predicted gene pairsranked by statistical dependency between expression profiles,given the gold-standard positive (pos) and negative (neg) sets:
LR = P (coexpressed pair|pos) / P (coexpressed pair|neg)
Enter the code exactly as it appears. All letters are case insensitive.