Join for Free!
122503 members

table of contents table of contents

An increasingly common application of gene expression profile data is the reverse …

Home » Biology Articles » Bioengineering » Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks » Methods

- Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks

2.1 Microarray data
Generations of microarray replicates are described in detailin Tu et al. (2002). In brief, mRNA from the Ramos human Burkitt'slymphoma cell line is used for the experiments. The purifiedsample is separated equally into several subgroups and eachsubgroup independently goes through the preparation steps. Thefinal target sample is then divided into several samples andindependently hybridized to 10 different Affymetrix HGU95A arrays.The data set used for investigating gene co-expression consistsof gene expression profiles from 254 naturally occurring phenotypicvariations of human B-cell. It represents a wide variety ofhomogenous B-cell phenotypes derived from normal and tumor relatedpopulations. The microarray experiments are described in Bassoet al. (2005) and the CEL files are available on the Gene ExpressionOmnibus website (series accession number: GSE2350 [NCBI GEO] ).

2.2 Permutation of CEL file
Raw signal intensities for each probe pairs were randomly permutedto create uninformative CEL files. We retained the relativeposition between PM and MM for every probe pairs, in order toensure fair comparison between normalization procedures thatutilize MM information to correct for non-specific binding andthose that rely entirely on PM intensities. However, shufflingthe probe pairs has been sufficient to destroy real signal ofthe probe sets as they now consist of random probes values.This data is crucial in our comparative study as the null setshould not contain any information.

2.3 Normalization procedures
We compared the four normalization procedures MAS5, RMA, GCRMAand Li–Wong, and all the normalization were implementedusing software packages available from Bioconductor (http://www.bioconductor.org).We used the default parameters from the software packages unlessotherwise specified. The term ‘Li–Wong’ refersto the procedure that normalizes arrays using invariant setof genes and then fits a parametric model to the probe set data,as described in Li and Wong (2001).

2.4 Evaluation of biological function relationship
GO annotations of the genes were extracted from Affymetrix HGU95annotation file. There are 10 369 terms for biological processin total and 61 general terms were removed. We are interestedonly in specific terms that are shared by <5% of the genesin the microarray. A gene pair sharing a common GO term is thendeemed functionally related.

2.5 Likelihood ratio of protein–protein interaction
We assembled a set of gold-standard positive interactions bytaking the union of interaction data from the Human ProteinReference Database (HPRD), the Biomolecular Interaction NetworkDatabase (BIND), the Database of Interacting Proteins (DIP)and IntAct (Bader et al., 2003; Hermjakob et al., 2004; Periet al., 2003; Xenarios et al., 2002). The resulting gold-standardpositive set consists of 21 509 unique PPIs (heterodimers only)that could possibly pair up among genes in the Human GenomeU95 array. A negative gold-standard is harder to define, butwe took the common approach by taking the lists of protein pairsthat are unlikely to interact given their cellular localization.The assembled negative set contains 6 101 360 pairs of proteinsencoded by genes represented on the U95 array. The likelihoodratio is computed as the fraction of conditional probabilitiesfor a set of protein pairs, here the top predicted gene pairsranked by statistical dependency between expression profiles,given the gold-standard positive (pos) and negative (neg) sets:


LR = P (coexpressed pair|pos) / P (coexpressed pair|neg)

rating: 0.00 from 0 votes | updated on: 10 Oct 2008 | views: 17510 |

Rate article: