Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks
Wei Keat Lim 1,2,
Kai Wang 1,2,
Celine Lefebvre 2 and
Andrea Califano 1,2,*
1Department of Biomedical Informatics, Columbia University, 622 West 168th Street, Vanderbilt Clinic 5th Floor and 2Center for Computational Biology and Bioinformatics, Columbia University, 1130 Saint Nicholas Avenue, New York, NY 10032, USA
An open access article: Bioinformatics 2007 23(13):i282-i288.
Abstract
Motivation: An increasingly common application of gene expressionprofile data is the reverse engineering of cellular networks.However, common procedures to normalize expression profilesgenerated using the Affymetrix GeneChips technology were originallydeveloped for a rather different purpose, namely the accuratemeasure of differential gene expression between two or morephenotypes. As a result, current evaluation strategies lackcomprehensive metrics to assess the suitability of availablenormalization procedures for reverse engineering and, in general,for measuring correlation between the expression profiles ofa gene pair.
Results: We benchmark four commonly used normalization procedures
(MAS5, RMA, GCRMA and Li-Wong) in the context of established
algorithms for the reverse engineering of protein–protein
and protein–DNA interactions. Replicate sample, randomized
and human B-cell data sets are used as an input. Surprisingly,
our study suggests that MAS5 provides the most faithful cellular
network reconstruction. Furthermore, we identify a crucial step
in GCRMA responsible for introducing severe artifacts in the
data leading to a systematic overestimate of pairwise correlation.
This has key implications not only for reverse engineering but
also for other methods, such as hierarchical clustering, relying
on accurate measurements of pairwise expression profile correlation.
We propose an alternative implementation to eliminate such side
effect.