Reverse Polarization in Amino acid and Nucleotide Substitution Patterns Between Human–Mouse Orthologs of Two Compositional Extrema
Sumit K. Bag1, Sandip Paul1, Subhagata Ghosh2
and Chitra Dutta1,2,*
1 Bioinformatics Centre, Indian Institute of Chemical Biology, Kolkata 700 032, India
2 Structural Biology and Bioinformatics Division, Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700 032, India
Genome-wide analysis of sequence divergence patterns in 12 024 human–mouse orthologous pairs reveals, for the first time, that the trends in nucleotide and amino acid substitutions in orthologs of high and low GC composition are highly asymmetric and polarized to opposite directions. The entire dataset has been divided into three groups on the basis of the GC content at third codon sites of human genes: high, medium, and low. High-GC orthologs exhibit significant bias in favor of the replacements, Thr Ala, Ser Ala, Val Ala, Lys Arg, Asn Ser, Ile Val etc., from mouse to human, whereas in low-GC orthologs, the reverse trends prevail. In general, in the high-GC group, residues encoded by A/U-rich codons of mouse proteins tend to be replaced by the residues encoded by relatively G/C-rich codons in their human orthologs, whereas the opposite trend is observed among the low-GC orthologous pairs. The medium-GC group shares some trends with high-GC group and some with low-GC group. The only significant trend common in all groups of orthologs, irrespective of their GC bias, is (Asp)Mouse (Glu)Human replacement. At the nucleotide level, high-GC orthologs have undergone a large excess of (A/T)Mouse (G/C)Human substitutions over (G/C)Mouse (A/T)Human at each codon position, whereas for low-GC orthologs, the reverse is true.
Key words: high-GC orthologs; low-GC orthologs; amino acid replacement matrix; nucleotide replacement matrix; sequence divergence
DNA Research 2007 14(4):141-154. Published in the Journal and Oxford University Press. An Open Access Article.