RESULTS AND DISCUSSION
Ninety-Two Putative Proteases Are Predicted by Comparative Genomic Analysis
To gain further insight into the proteolytic machinery of the malaria parasite, the protein sequences in the annotated P. falciparum genome were subjected to an exhaustive search against the Merops protease database, which has a catalog and a structure-based classification of proteases. We adopted a relatively stringent threshold of E 1e-04 for BLASTP to ensure the high coverage with low false-positives. Redundant hits and partial sequences were excluded, resulting in a total of 92 protease homologs (Table 1). As highlighted in the Protease nomenclature column in Table 1, all twelve previously characterized proteases with proteolytic activity are included. In addition, as highlighted in the Gene ID column, 23 out of 25 proteases predicted by first-pass annotation published in PlasmoDB are included, among which subtilases 1 and 2 have been demonstrated to possess proteolytic activity; PFI0660c is not included because the E-score (0.39) of its closest homolog (Bacillus anthracis CAAX amino terminal protease, accession number NP_655263) is far below the cut-off 1e-04; PF11_0314 is not included because it is more likely to possess ATP hydrolytic and regulatory function than protoelytic function based on sequence homology.
The domain/motif organization of predicted proteases was revealed by the InterPro Search. For each putative malaria protease, the known protease sequence or protease domain of the highest similarity was used as a reference for annotation; the catalytic type and protease family were predicted in accordance with the classification in the protease database Merops (http://www.merops.co.uk/merops/merops.htm), and the enzyme was named in accordance with the SWISS-PROT enzyme nomenclature (http://www.expasy.ch/cgi-bin/lists?peptidas.txt) and literatures.
New Catalytic Types and Families
Proteases are classified into five major clans (Aspartic, Cysteine, Metallo, Serine, and Threonine) based on their catalytic mechanisms. They can be further grouped into distinct families and subfamilies by intrinsic evolutionary relationships (Rawlings and Barrett 1993). Using the comparative database search, we detected a total of 59 new protease homologs, in addition to 12 characterized proteases with proteolytic activity and 21 predicted by official annotation (Table 1). Moreover, a spectrum of conserved core characteristic domains/motifs for specific catalytic classes has been detected in most of the predicted proteases, indicating their potential activity.
The 92 putative proteases belong to 26 families of five clans, compared to the previously reported 12 proteases that belong to six families of four clans (Rosenthal 2002). The distribution (11% aspartic, 36% cysteine, 22% metallo, 17% serine, and 14% threonine) resembles those in other model organisms, supporting the fundamental premise that a prototype protease system is conserved throughout evolution (Rawlings and Barrett 1993; Southan 2001). Our speculation that a large number of potential proteases remain unexplored in the P. falciparum genome appears justified. Undoubtedly, some of the uncharacterized proteases will perform crucial functions in the parasite life cycle as discussed below.
Examples of Potentially Important Proteases
Calpain is a group of intracellular cysteine proteases that mediate a wide variety of physiological and pathophysiological processes, including signal transduction, cell motility, apoptosis, and cell cycle regulation (Sorimachi et al. 1997; Glading et al. 2002). In P. falciparum, a calpain, yet unidentified, was believed to be essential in merozoite invasion, based on the observation that Calpain inhibitors I and II strongly blocked invasion (Olaya and Wasserman 1991).
We have identified a putative calpain (MAL13P1.310) in the P. falciparum genome, which exhibits high sequence similarity to C. elegans calpain-7 (E=2e-35). Moreover, its ortholog (accession no. EAA19663) has been identified in the newly released genome of the model rodent malaria parasite Plasmodium yoelii yoelii. It possesses a catalytic domain (985–1453) detected by the Hidden Markov Model in the pfam search, with E = 8.0e-13 (Fig. 1). The most intriguing aspect of this domain is the presence of three active sites (Cys1035, His1371, and Asn1391) that constitute a cleft crucial for catalytic activity (Arthur et al. 1995). A multiple alignment of the catalytic regions was produced for the putative plasmodial calpain and the representative human calpains. In addition to the invariable Cys-His-Asn triad, a high degree of identity is also observed in its vicinity, reflecting stringent functional and mechanistic conservation (Fig. 1). Indeed, the experimental demonstration that a single catalytic subunit in rat and chicken calpains possesses a full bona fide proteolytic activity (Yoshizawa et al. 1995) reinforces the potential processing capacity of the putative plasmodial calpain.
Our further phylogenetic analysis of the putative P. falciparum calpain revealed its striking origin, which might have attributed to an alternative Ca2+-independent regulatory mechanism. Figure 2 shows the evolutionary tree inferred by the neighbor-joining (NJ) method using Poisson corrected distance (Saitou and Nei 1987). Evolutionary trees based on Parsimony (PAUP4.0) and Maximum Likelihood (PHYLIP) also yielded topologies and clade structures congruent with NJ (data not shown). Apparently, two putative plasmodial calpains belong to a novel monophyletic group of animal calpain-7 proteases, with 61% bootstrap support. They share the common domain architecture in the calpain-7 clade: lacking any significant similarity to the C-terminal EF-hand Ca2+-binding domain present in most of the essential Ca2+-dependent mammalian calpain subtypes (calpains -1, -2, -3, -9, -11, and Mu/M-type) (Franz et al. 1999). Provided that fungi cysteine protease PalB, the nearest neighbor of calpain-7, contains a PBH domain resembling the Ca2+-binding domain (Denison et al. 1995), one could speculate that the loss of Ca2+ dependency in calpain-7 subtype had been derived from evolutionary events such as domain shuffling, which might be associated with the divergence of mRNA splicing sites (Craik et al. 1983). Such events appear to have occurred close to or prior to the origin of the animal kingdom (Fig. 2).
The identification of plasmodial calpain has also implicated the existence of calpain-mediated pathways. Its potential cognate targets include host cytoskeletal proteins such as spectrin, integrin, and ezrin. Moreover, the recent discovery of a typical endogenous substrate of calpain, Protein Kinase C (PFL1110c; PFI1685w) in P. falciparum, has provided the support of a parasite-controlled signaling cascade (Doerig et al. 2002).
It is conceivable that the putative protease-active and Ca2+-independent plasmodial calpain may serve as a good antimalarial target for two reasons. First, it may be the central component of crucial signal transduction pathways that affect parasite biology and host-parasite interactions. Second, because it is evolutionarily divergent from the essential subtypes of host calpains, its specific inhibitor may have minimal effects on the host.
Metacaspase (PF13_0289) is another interesting hypothetical protease. In vertebrates, a cascade of caspases (cysteine aspartyl proteases) is the major modulator of apoptosis (programmed cell death) (Thornberry and Lazbnik 1998; Aravind et al. 1999). Two families of ancient caspase-like proteins (paracaspases and metacaspases) have been found in metazoans, fungi, and protozoa. As shown in the phylogenetic tree (Fig. 3), the putative plasmodial metacaspase occupies a distinct clade constituting paracaspases and metacaspases, which are likely to be the primordial form of 14 subfamilies of vertebrate caspases (bootstrap value = 100%). Interestingly, human paracaspase is capable of interacting with the oncogene Bcl10 and triggering NF-kB activation, indicative of the prone-to-apoptosis property of the ancestral caspase (Uren et al. 2000). Moreover, yeast metacaspase has been demonstrated as an effective executor for apoptosis, suggesting the root of apoptosis dates back to unicellular organisms (Madeo et al. 2002). The multiple alignment clearly reveals that the putative plasmodial metacaspase retains the typical caspase fold, which is centered with the His (404)-Cys (460) catalytic dyad conserved in all representative proteolytically active caspases (Fig. 4). Conversely, considerable sequence diversity is observed in the vicinity of this active site cleft. In particular, yeast metacaspase and the plasmodial homolog exhibit distinct sequence profile to other vertebrate caspases and human paracaspase. Previously, Uren et al. (2000) have postulated that ancient (paracaspases and metacaspases) and vertebrate subtypes differ in substrate-specificity. We have demonstrated that the experimentally confirmed differential substrate-specificity in major vertebrate subtypes is largely determined by the chemical property and configuration of residues situated in the caspase fold (Wang and Gu 2001). Thus, the observed distinct configuration of residues in the active site proximity could account for parasite-specific substrate-preference.
In Plasmodium, the physiological process of apoptosis has never been reported, nor the critical components identified. Nevertheless, the detection of the metacaspase homolog will allow us to investigate the role, if any, of apoptosis and/or analogous signal transduction pathway in the parasite. In addition, since metacaspases have only been found in protozoans, yeasts, and possibly in plants, and are phylogenetically distinct from other caspase subtypes (Fig. 3), the putative plasmodial metacaspase may serve as a potential chemotherapeutic target.
Signal Peptidase 1 (SP1)
Signal peptidases (SP) play indispensable roles in protein trafficking and sorting by removing signal peptides from precursors of secretary proteins. This serine protease family consists of two subtypes, SP1 and signalase, based on their distinct structural, functional, and evolutionary features. To date, SPs have been found in bacteria, archaea, fungi, plants, and animals; however, SP has never been reported previously in protists, despite the fact that the dynamic parasite life cycle reflects a need of specific peptidase(s) to process proteins that are translocated across host and parasite membranes. Using the comparative genomic search, we first identified two homologs of signal peptidase, PF13_0118 (SP1) and MAL13P1.167 (signalase) in P. falciparum.
Between two subtypes, SP1 has generated extensive research interest because it represents a novel antibiotic target for its distinct prokaryotic origin and essential functions (Paetzel et al. 2000). We have also identified an ortholog of P. falciparum SP1 in the rodent parasite P. yoelii yoelii genome. Our evolutionary analysis revealed that the two putative plasmodial SP1 have three clusters of homologs: (1) Bacteria SP1; (2) an Arabidopsis chloroplast thylakoidal processing peptidase; and (3) mitochondrial inner membrane peptidases (Imp) found in eukaryotes, which appear to be the nearest neighbor to plasmodial SP1 (Fig. 5). Given the proposed prokaryotic origin of the chloroplast and mitochondrion, malarial SP1 is likely to have evolved via the prokaryotic-specific lineage. Moreover, the potential of its catalytic activity can be inferred from the comparative sequence analysis. The putative SP1 contains the catalytic dyad (Ser175, Lys274) that is invariable across representative SP1 proteins with confirmed signal peptidase activity (Fig. 6). Most notably, this Ser/Lys catalytic dyad mechanism is unique in SP1, compared with the typical Ser/His/Asp triad system in other serine proteases. It seems plausible that the putative plasmodial SP1 has a fundamental role yet to be determined, and represents a promising target given its distant relatedness to the host.
Important Protease-Mediated Pathways Implicated in P. falciparum
Our findings suggested at least five new protease-mediated activities: (1) an ATP-dependent ubiquitin-proteasome-mediated cell-cycle control and stress-response system (Verma et al. 2002). Although the mechanism by which proteasomes function in P. falciparum is poorly understood, their importance was suggested by the observed irreversible inhibition on the growth and development of the hepatic and erythrocytic stages of three different Plasmodium species by Lactacystin, a specific threonine protease inhibitor (Gantt et al. 1998). The identification of the clade of threonine proteases and , and a series of ubiquitinyl hydrolases (UCH1 and UCH2) brings new insight into this universally conserved proteasome machinery (Table 1). (2) A lysosomal proteolysis. This selective pathway to degrade cytosolic proteins may involve a number of cathepsins with versatile functions, which are assisted by cytosolic and lysosomal molecular chaperones and receptor proteins in the lysosomal membrane. (3) A calpain-activated signal transduction cascade, which may work in conjunction with upstream modulator and downstream effectors of host or parasite origin. (4) A caspase-mediated apoptosis or apoptosis-like signal transduction pathway. Although yeast metacaspase has been confirmed to induce apoptosis, the classical apoptosis regulators appear to be missing in the yeast genome. Thus, it is desirable yet challenging to identify the key components in this pathway, which may be conserved across organisms, or be parasite-specific. (5) A signal peptidase-initiated precursor protein processing pathway.
Studying the origin and the evolutionary mechanisms behind plasmodial proteases will contribute to the selection of target proteases to be studied in detail, for which specific inhibitors with no or minimal effect on the host can be designed. A complex evolutionary scenario including gene duplication, domain shuffling, and lateral gene transfer has been implicated in the preliminary analysis of the predicted proteolytic machinery in P. falciparum. Gene duplication is believed to play important roles in the evolution of multigene families by providing raw material for the novel functionality under differential evolutionary constraints (Ohno 1970; Li 1983; Friedman and Hughes 2001; Gu et al. 2002; McLysaght et al. 2002). In P. falciparum, well-characterized falcipains (-1, -2, -3), plesmepsins (I-IV), and subtilases (-1, -2) exemplify the multigene families that arise from gene duplications (Coombs 2001; Rosenthal 2002). We have identified a series of putative proteases that may comprise multigene families (Table 1). Some reflect tandem gene duplications in adjacent chromosome loci. For example, eight SERA homologs aggregate as a cluster in chromosome 2 contig 11953 (Miller et al. 2002). In contrast, some potential paralogs are located in remote chromosome regions. For instance, the UCH2 family with the consensus domain is sparsely distributed over seven chromosomes. This suggests that ancient gene duplications and subsequent functional divergence may result in an extensive repertoire of the present multigene families. In addition to gene duplication, domain shuffling coupled with the splice-site variation, intron loss, and horizontal gene transfer are proposed to be important modes in the evolution of aspartic proteases in the parasite genus Apicomplexa, including P. falciparum (Jean et al. 2001). The proteases encoded by or destined to parasite organelles are of particular interest because organelles represent microenvironments in which proteases may evolve at different rates and thus achieve novel functions (Fast et al. 2001). The first target organelle is the apicoplast, the apicomplexan-specific plastid derived by secondary reduction of a red alga endosymbiont. Since the plastid-encoded gene is of prokaryotic origin, its inhibitor may have only a minor, if any, effect on the vertebrate host and therefore may represent a promising antimalarial target. Our preliminary analysis shows that the putative clpC gene "PF11_0175" matches one apicoplast-encoded gene (Wilson et al. 1996). Moreover, 14 predicted proteases may contain an apicoplast transit peptide, among 511 genes identified in the entire parasite genome by pattern-recognition program PATS (Predict Apicoplast-Targeted Sequences) (Zuegge et al. 2001). From the population genetics perspective, we would anticipate detecting a certain level of polymorphism among putative proteases, due to the ancient origin of P. falciparum as revealed by chromosome-wide SNP analysis (Verra and Hughes 1999; Mu et al. 2002; Wootton et al. 2002). However, the alternative Malaria's eve hypothesis of a severe recent population bottleneck may still be valid (Rich et al. 1998; Volkman et al. 2001). More detailed analysis of the genomics and proteomics of plasmodial proteases will help resolve these fundamental questions about P. falciparum evolution.
Eighty-Three Putative Proteases Are Actively Transcribed in the Intraerythrocytic Stage, and Sixty-Seven Are Actively Translated in the Life Cycle
We are bearing in mind that genome analysis based solely on sequence similarity clearly predicts many unknown putative malaria proteases, however, these are only predictions. Which of the 59 newly predicted proteases, in addition to the 12 characterized proteases and 21 proteases annotated previously, are true protein-encoding genes expressed in the parasite life cycle? This important question was first addressed by analyzing an en masse gene expression profile using microarray chips, and then followed by RT-PCR confirmation.
We focused on the parasite expression profiles of the asexual erythrocyte stage not only because this stage is responsible for malaria clinical manifestations, but also because of the accessibility of the research materials. In order to obtain all genes transcribed throughout the erythrocyte stage of the parasite, we extracted and pooled mRNAs from P. falicarpum 3D7 culture samples collected at four 12-h intervals. Figure 7 shows the temporal development of parasites that includes rings, trophozoites, schizonts, and merozoites, indicating that an asynchrony was successfully achieved. Probes were labeled with fluorescent dyes using mRNAs purified from the asynchronous culture as a template, and then hybridized to the microchip arrayed with 6239 Malaria Genome Array Oligomers (Operon Technologies).
Results, summarized in Table 2, clearly demonstrated that 75 predicted proteases have signal intensities higher than those of negative controls. Being aware that the cut-off value for signal intensity is controversial, and that using the average intensity of the negative controls may be somewhat arbitrary, we selected the gametocyte-specific proteins (CS protein TRAP-related protein, Pfs25, Pfs48/45, Pfg377, and a gametocyte-specific var) and three large gene families in which the majorities are silent due to clonal expression switching (var, rifin, and stevor) as internal references (Hayward et al. 2000; Ben Mamoun et al. 2001). As anticipated, all gametocyte-specific genes, 39 of 45 var genes, 99 of 118 rifin genes, and 12 of 14 stevor genes displayed signal intensities below the level of the negative controls (data not shown). These data further support our conclusion that 75 predicted proteases are actively transcribed during the erythrocytic stage. Interestingly, the putative multigene families such as SERA and UCH2 exhibit variable expression levels across paralogous members, reflecting a certain level of functional divergence after gene duplication events.
We also analyzed two microarray datasets published in the PlasmoDB. The first dataset includes the expression profile of two erythrocyte stages (Trophozoites and Schizonts) using the Oligo Microarray (Hayward et al. 2000). The result of 66 predicted proteases transcribed in at least one stage supported our finding that the majority of the predicted proteases were actively transcribed during the erythrocyte stage (Table 2). The second dataset represents the first proof-of-concept experiment of using cDNA microarray to explore the expression profile of five erythrocytic forms and stages (Ben Mamoun et al. 2001). Among 944 elements or gene fragments (317 genes of identifiable homology) included in the probe design, eight corresponded to predicted proteases. The positive signals of seven genes are consistent with our result from asynchronous culture. The stage-specific profile also confirmed the ubiquitous expression of the putative proteosome 6 (PFI1545c), which does not have corresponding 70-mer in the Oligo Microarray.
Reverse Transcription Polymerase Chain Reaction
Among the 17 remaining predicted proteases that are not detected using microarray hybridization, seven showed signal intensity below the negative controls. One possibility is that some of them are expressed in stages other than the asexual erythrocytic ones. This could be further investigated by using RNAs extracted from the intraerythrocytic and extraerythrocytic stages. The remaining ten predicted proteases were not included in the oligomer set printed on the array slides because only 90% of the P. falciparum genome data was available when the oligomers were designed. To examine whether these predicted proteases were also expressed in the erythrocyte stage, we designed specific primers and performed RT-PCR using the RNA extracted from the asynchronous culture (Fig. 7) as templates. Data shown in Figure 8A clearly suggests that all ten predicted genes were actively transcribed.
As mentioned above, P. falciparum calpain, metacaspase, and signal peptidase1 are of particular interest due to the potential biological roles they may play. The microarray analysis suggested that the predicted genes for these proteases were actively transcribed (Table 2). We also performed RT-PCR to further confirm their expression (Fig. 8B).
The microarray and RT-PCR data only indicated the active transcription of 85 predicted proteases. In order to examine expression at the level of translation, we analyzed the proteomics data published in PlasmoDB (Florens et al. 2002). It appeared that 67 out of 92 predicted proteases are translated at some point during the life cycle (Table 2). Some proteases are ubiquitous, whereas others show stage-specific expression. It was notable that the three predicted proteases that did not have detectable transcription from the microarray assay did show positive translation in specific stages including intraerythrocytic stages.
Combining the complementary results from microarray, RT-PCR, and proteomics analysis, we found that of the 92 putative proteases identified by scanning the genome, 88 were transcribed and 67 were translated at some stage in the life cycle. The remaining four may be expressed at extraerythrocytic stages or may be pseudogenes, a result due to the frameshift in the open reading frame (Triglia et al. 2001).
The exhaustive homology search and comparative sequence analysis have resulted in the delineation of 92 putative proteases, including 59 that had not been previously recognized in the P. falciparum genome. This set includes potentially important proteases such as calpain, metacaspase, and signal peptidase, and indicates protease-mediated activity that may be vital for parasite life cycle. Furthermore, 88 are demonstrated to be actively transcribed proteins by the microarray, RT-PCR data, and proteomics. This study is an initial attempt at the systematic identification of novel malaria proteases that have essential functions and assessment of their evolutionary relationship to the vertebrate host. By combining in silico genomics-based predictions with experimental confirmation, there is an increased likelihood of identifying new therapeutic targets.