table of contents table of contents

The most abundant family of insect cuticular proteins, the CPR family, is …

Home » Biology Articles » Zoology » Entomology » Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae » Background

- Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae

Arthropod cuticle consists predominantly of chitin fibers embedded in a protein matrix [1]. While chitin is a simple polymer of N-acetylglucosamine, there is a large number of cuticular proteins (see [2,3] for review). The vast majority of cuticular protein sequences presently available belong to a family with the R&R Consensus, first identified by Rebers and Riddiford [4]. An extended version of the original Consensus has been shown to bind to chitin [5,6], and the conformation it may adopt has been modeled [7,8]. Throughout this paper, we will use the term, R&R Consensus, to refer to the extended Consensus and CPR to refer to the family of genes/proteins with this Consensus. The Consensus, with about 64 amino acids, almost always begins near a triad of aromatic residues (Y/F-x-Y/F/W-x-Y/F) and terminates shortly after a uniformly conserved G-F/Y (Figure 1).

Figure 1 Features of the R&R Consensus in An. gambiae and their relationship to the pfam00379 motif. The longest region that generally could be aligned across An. gambiae CPRs is shown beginning with a proline or glycine two or three sites prior to the aromatic triad and ending with a proline or glycine eight positions C-terminal to the final invariant glycine. Our alignment of this region was 65 positions long for aligned RR-2 genes, but the complete alignment was 83 positions long due to length variation in RR-1 genes. The pfam00379 alignment includes seven additional positions N-terminal to our alignment. The region used in the phylogeny extended further in the N-terminal and C-terminal directions as described in the text. The shaded region on the top line was double-weighted because this encompasses the principal features of the R&R Consensus that are present in virtually all An. gambiae CPRs.

While the R&R Consensus is conserved across arthropods, its location within a cuticular protein and the nature of the regions that flank it are highly variable. Understanding of the role of these proteins in forming the insect exoskeleton and other cuticular structures will be facilitated by defining all of the cuticular proteins of a single species. Accounts of the cuticular proteins with the R&R Consensus have now been published for 28 proteins from Apis mellifera [9] and for 101 from Drosophila melanogaster [10]. Also 102 CPR proteins have been identified in the genome of Tribolium castaneum (Beeman and Willis, unpublished observations). These annotations depended in large part on computerized genome annotation and were not systematically verified at the mRNA or protein level.

In the present study, we have carried out an exhaustive manual annotation of the CPR family of An. gambiae based on the whole genome sequence of the PEST strain [11]. These annotations are being facilitated and verified by a proteomics analysis of cuticles [[12], He unpublished observations] and accompanied by an analysis of gene expression with real-time RT-PCR [13]. In addition, ambiguous gene models have been confirmed or revised by sequencing RT-PCR or RACE products. This work has identified 156 genes coding for CPR proteins. Hence over 1% of the genes of An. gambiae are devoted to just this one family of cuticular proteins.

An investigation of cuticular proteins in An. gambiae carried out prior to whole genome sequencing was particularly informative for the present annotation study. Dotson et al. [14] sequenced a 17.4 kb insert in a genomic library constructed from the Sua strain. This region had three CPR genes that were at least 98% identical in their coding regions, yet differed in 5' and 3' UTRs as well as their introns. Hence, the lesson learned was that virtually identical genes can reside in compact tandem arrays, yet can be recognized as distinct and not an assembly artefact because of the differences in the non-coding regions associated with them.

CPR proteins can be divided into groups according to which variant of the extended R&R Consensus they possess. Two major groups have been named RR-1 and RR-2 while a third group (RR-3) has been identified but from only a small number of sequences [15,16]. It is unclear whether RR-3s are an evolutionarily distinct group; for the present analysis we include RR-3 genes within the RR-1 class. A Hidden Markov Model can be employed at the cuticle DB web server [17] to assign proteins as RR-1 or RR-2 [10]. Our analysis confirms that the bulk of RR-1 and RR-2 proteins form non-overlapping clades in An. gambiae, separated by a small set of long-branch RR-1, RR-2, and RR-3 proteins that are probably an artificial group. In addition to assembling information that supports annotation, we have analyzed the structure of these clades, examined patterns of molecular evolution, compared the amino acid composition of the different proteins and identified characteristics of each group. We now have further appreciation of the complexity of the insect cuticle and clues about the need for so many CPR genes.

rating: 0.00 from 0 votes | updated on: 30 Jun 2008 | views: 7999 |

Rate article: