Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae
R Scott Cornman* 1, Toru Togawa* 1, W Augustine Dunn1, Ningjia He2, Aaron C Emmons1 and Judith H Willis1
1Department of Cellular Biology, University of Georgia, Athens, GA 30602 USA
2The Key Laboratory of Sericulture, Southwest University, Chongqing 400715, PR China
An Open Access article from BMC Genomics 2008,
The most abundant family of insect cuticular proteins, the CPR
family, is recognized by the R&R Consensus, a domain of about 64
amino acids that binds to chitin and is present throughout arthropods.
Several species have now been shown to have more than 100 CPR genes,
inviting speculation as to the functional importance of this large
number and diversity.
We have identified 156 genes in Anopheles gambiae that code
for putative cuticular proteins in this CPR family, over 1% of the
total number of predicted genes in this species. Annotation was
verified using several criteria including identification of TATA boxes,
INRs, and DPEs plus support from proteomic and gene expression
analyses. Two previously recognized CPR classes, RR-1 and RR-2, form
separate, well-supported clades with the exception of a small set of
genes with long branches whose relationships are poorly resolved.
Several of these outliers have clear orthologs in other species.
Although both clades are under purifying selection, the RR-1 variant of
the R&R Consensus is evolving at twice the rate of the RR-2 variant
and is structurally more labile. In contrast, the regions flanking the
R&R Consensus have diversified in amino-acid composition to a much
greater extent in RR-2 genes compared with RR-1 genes. Many genes are
found in compact tandem arrays that may include similar or dissimilar
genes but always include just one of the two classes. Tandem arrays of
RR-2 genes frequently contain subsets of genes coding for highly
similar proteins (sequence clusters). Properties of the proteins
indicated that each cluster may serve a distinct function in the
The complete annotation of this large gene family provides insight
on the mechanisms of gene family evolution and clues about the need for
so many CPR genes. These data also should assist annotation of other Anopheles genes.