table of contents table of contents

An increasingly common application of gene expression profile data is the reverse …

Home » Biology Articles » Bioengineering » Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks » Conclusions

- Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks

The use of GCRMA and RMA normalization procedures for AffymetrixGeneChip® technology has received a remarkably broad adoptionin the community due to previous benchmarks demonstrating theirsuperiority with respect to other methods. However, while thesemethods perform well in the assessment of differential expressionanalysis, we found that they also introduce correlation artifactsin the data. This seriously undermines their utilization, atleast in their standard form, upstream of reverse engineeringalgorithms or any other method relying on the estimate of expressionprofile correlation. Thus, our results raise issues on the validityof many studies obtained on the basis of correlation measuresafter these normalization procedures were applied. Specificallywe suggest that the implementation of a specific step in GCRMA—theGSB adjustment of truncated values—introduces artificialcorrelation among the probesets. Unfortunately, according toour analysis, these artifacts are not dataset specific and cansurvive even after the use of additional probe sets postprocessingfilters such as those based on mean, SD and coefficient of variation.

Results were completely consistent across four classes of tests,including (a) a direct assessment of correlation artifacts fromreplicate and randomized samples, (b) an evaluation of the globaltopological properties of reverse engineered networks, (c) astudy of the functional clustering of correlated genes and (d)a study of the relationship between gene-pair expression profilecorrelation and membership in stable protein complexes. Theunequivocal result is that normalization with GCRMA substantiallyreduces the ability to distinguish between actual and incorrectfunctional and physical interactions. In particular, GCRMA islikely to introduce an extraordinary number of false positives,while MAS5 appears to perform optimally with respect to thesetests.

We conclude that the choice of normalization procedure stronglyaffects the correlation structure in the data. Thus, choosingthe right normalization procedure is a key step towards theinference of accurate cellular networks. Our comparative analysisfavors MAS5 in this context even though (or probably because)it infers fewer interactions but with the highest functionaland physical interaction enrichment.

Finally, we suggest that a specific correction to the defaultimplementation of GCRMA in the R package appears to substantiallyimprove its performance, making it competitive with that ofMAS5. With this correction, we believe that GCRMA can be properlyutilized in the context of reverse engineering gene networks.

rating: 0.00 from 0 votes | updated on: 10 Oct 2008 | views: 17796 |

Rate article: