Identification and molecular cloning of a new human polyserine protease
We used the polyserase-2 cDNA as a query and the BLAST algorithm to search regions in the human genome that could encode new proteases with several serine protease domains within the same polypeptide chain. This search allowed us to identify a region in chromosome 16p11.2 containing two serine protease domains closely linked. Conceptual translation of these domains showed that they were different to the serine protease domains of polyserase-2 and prostasin, which are serine protease genes located at the same chromosomal region [3,22]. Then, a PCR-based approach was designed to clone the cDNAs for these two uncharacterized serine protease domains. To this end, we used RNA from human liver and once the cloning process was completed, we confirmed that both domains were encoded by a single gene. Computer analysis of the obtained sequence revealed that this cDNA encodes a protein of 553 amino acids, with a predicted molecular mass of 58.4 kDa (Fig. 1A and EMBL accession number AJ627035). Following the nomenclature system proposed for these enzymes , we have tentatively called this new polyserine protease polyserase-3.
A detailed analysis of polyserase-3 sequence showed that it contains the structural hallmarks characteristic of serine proteases (Fig. 1A), with some relevant particularities. Thus, the sequence contains a signal peptide (positions 1 to 23), which predicts that this protein is targeted to the endoplasmic reticulum to direct its secretion outside of the cell. Following this region, the first serine protease domain (Spd1) can be recognized (positions 28 to 270) although this domain does not show the Arg-↓-Ile-Val-Gly-Gly consensus activation motif present in most of these enzymes. The sequence present instead at this region is Pro-Lys-Pro-Gln-Glu. The catalytic triad of Spd1 comprises the residues His77, Asp128, and Ser224. Following a short spacer region (positions 271 to 294), the second serine protease domain (Spd2) can be clearly identified (positions 295 to 553). This domain also lacks a consensus activation motif and its catalytic triad comprises the residues His341, Asp382, and Ser478. This last Ser residue is located within the sequence Gly-Leu-Ser-Gly-Ala (positions 476 to 480), which does not exactly match the consensus motif Gly-Asp-Ser-Gly-Gly found in this class of enzymes. There are also a number of cysteine residues in both protease domains of polyserase-3 which are conserved in serine proteases, including those located at positions 28 and 144 in Spd1, and 296 and 402 in Spd2. These residues could form two disulfide bonds which would determine that both protease domains remain linked to the polypeptide chain if a cleavage would take place at the activation site (Fig 1B). All these structural features can also be found in the amino acid sequence of putative orthologs of polyserase-3 predicted from the genome analysis of Pan troglodytes (99% identity), Bos taurus (80%), Canis familiaris (84%), Mus musculus (81%), and Rattus norvegicus (80%) (Fig. 1C).
Comparative analysis of polyserase-3 with other serine proteases
The predicted amino acid sequence corresponding to the catalytic region of each polyserase-3 protease domain revealed a high degree of identity with other serine proteases (Fig 2). Comparative analysis of the first serine protease domain sequence indicated that the highest degree of identity was found with the first serine protease domain of polyserase-2 (40%). Significant percentage of identities were also found with pancreasin (36%), the second serine protease domain of polyserase-2 (35%), matriptase-2 (35%), prostasin (34%), and the third serine protease domain of polyserase-2 (34%). The second domain of polyserase-3 was also found to be closely related to the first protease domain of polyserase-2 (38%) as well as to other serine proteases such as γ-tryptase (37%), the second serine protease domain of polyserase-2 (34%), prostasin (33%), matriptase-2 (34%), and the first serine protease domain of polyserase-1 (32%). All these enzymes, with the exception of polyserase-2, belong to the transmembrane type (TTSP) or to the tryptase/pancreasin families of serine proteases [23-26]. Sequence alignments of these proteins with each protease domain of polyserase-3 (Fig. 2A) confirmed the extensive degree of conservation around the residues that form the catalytic triad of all these proteases. We also performed an analysis in the polyserase-3 sequence of molecular markers of serine protease evolution described by Krem and Di Cera . This analysis revealed that polyserase-3 as well as polyserases- 1 and -2 use exclusively TCN codons for their active site serine residues (corresponding to Ser-195 in the chymotrypsinogen sequence). We also found that the Ser residues of all polyserases, which are equivalent to the Ser-214 residue of chymotrypsinogen, are always encoded by AGC codons. Finally, analysis of the third molecular marker associated with catalytic function in serine proteases (Pro or Tyr residues at position 225 in chymotrypsinogen numbering) revealed the presence of a Pro residue in both domains of polyserase-3. Likewise, Pro residues are present at the equivalent positions in the three serine protease domains of polyserase-1 as well as in the catalytically active domain of polyserase-2. Taken together, these results reinforce the classification of these polyproteases in the clan SA of serine proteases and extend the proposal of a close evolutionary relationship between them.
The phylogenetic tree for these proteins (Fig. 2B) also showed the close relationship of each polyserase-3 serine protease domain with the equivalent regions of polyserase-2. Together, the five protease domains of these two polyserases form a phylogenetic branch distantly related to the TTSP and tryptase/pancreasin families of serine proteases. Furthermore, the exon-intron organization of the catalytic region of the first domain of polyserase-3 is similar to that of TTSP and tryptase/pancreasin serine protease genes. In fact, the length of the intron that separates the exons containing the His and Asp residues of the catalytic triad of the first protease domain of polyserase-3, is similar to that found in the equivalent region of matriptase-2 [27,28]. However, the length of the remaining introns is similar to that found in the equivalent regions of the α/β-tryptases (Fig 2C). We have previously described that the polyserase-2 gene also shows a pattern of exon-intron organization that shares similarities with both groups of serine proteases . Likewise, the polyserase-3 gene also contains three coding exons in the genomic region that comprises the signal sequence and the putative activation site. By contrast, only two coding exons are found in the equivalent region of α/β-tryptase genes [29-32].
Molecular modeling of polyserase-3 serine protease domains
The amino acid sequence similarity between each serine protease domain of polyserase-3 and serine proteases whose three-dimensional structures are available, opened the possibility of performing their structural modeling (Fig. 3). This analysis revealed a significant degree of similarity between both domains of polyserase-3 and some members of the tryptase family, such as human β-tryptase II . Thus, in the predicted structure there is a loop that surrounds a calcium ion in most serine proteases (shown in yellow in Fig. 3) , although in the case of Spd1 and β-II tryptase is shorter, and it does not exist in Spd2, suggesting that polyserase-3 does not requires calcium for its activity. Apart from the disulfide bonds deduced from the alignment of polyserase-3 sequence with other serine proteases, the structural model of polyserase-3 predicts the existence of seven additional disulfide bonds. Four of these bonds are predicted to occur within Spd1, and the eight cysteine residues involved would be Cys62-Cys78, Cys158-Cys230, Cys187-Cys209, and Cys220-Cys249. Equivalent disulfide bonds are predicted in the structure of the human β-tryptase II (Fig. 3). The three remaining bonds would occur within Spd2, and the involved residues would be Cys326-Cys342, Cys444-Cys464, and Cys474-Cys502 (Fig. 3).
Polyserase-3 is a secreted and non-glycosylated protein
The pCEP-pol3 vector was used to transfect 293-EBNA cells. Immuno-localization experiments using an anti-FLAG antibody showed a strong eccentric perinuclear signal (Fig. 4A). Moreover, and consistent with the absence of a membrane localization motif in the polyserase-3 sequence, we did not find any evidence of immunostaining at the cell surface. Similar results were obtained using HeLa cells transfected with the pCEP-pol3 vector (not shown). Likewise, the positive signal was only detected if cells were previously permeabilized using Triton X-100. This situation resembles that observed for polyserase-2  and differs from that of polyserase-1, which is a membrane-bound polyprotease . All these findings strongly suggest that polyserase-3 is a secreted polyserine protease. This possibility was further confirmed by Western blot analysis of the conditioned medium prepared from pCEP-pol3 transfected cells (Fig. 4B). In fact, the anti-FLAG antibody detected one immunoreactive band of about 55 kDa, which fits with the expected size for unprocessed polyserase-3. On the other hand, a doublet of similar size, which likely represents the protein with or without signal peptide, was detected in cell fractions, but none of them were present in cells transfected with the empty vector. To evaluate the possibility that the FLAG epitope could hamper the proper processing of the two serine protease domains of polyserase-3, we generated a construct lacking this epitope but keeping a HisTag tail at the C-terminus. Western blot analysis using an anti-HisTag antibody showed the same result as above (not shown), thereby confirming that both protease domains of polyserase-3 remain as integral parts of the same polypeptide chain. Additionally, and contrary to polyserase-2, the mobility of the band detected with this anti-HisTag antibody was not altered in the presence of tunicamycin, an inhibitor of N-glycosylation, suggesting that polyserase-3 is a non-glycosylated protein. Consistent with this, analysis of the polyserase-3 sequence using the NetNGlyc 1.0 Server  predicted that the only putative N-glycosylation site present in this protein (Asn543), would not be effectively glycosylated.
Production, purification and enzymatic assays of full-length polyserase-3 and its serine-protease domains
To produce the recombinant proteins, we first transformed E. coli strain BL21(DE3) pLysE with plasmids pGEX-pol3Spd1, pGEX-pol3Spd2, and pGEX-pol3. Moreover, we used ADAM23 disintegrin domain fused to GST to verify the purification processes as well as a negative control in the enzymatic assays . After IPTG induction of bacterial cells transformed with these plasmids, fusion proteins of the expected size (55, 57, 83 and 36 kDa respectively) were detected by SDS-PAGE (Fig. 5A). Once the purification process was carried out as indicated above, the fusion proteins were visualized by SDS-PAGE (Fig. 5A), and their identities confirmed by Western blotting using an anti-GST antibody (Fig. 5B). We next incubated the recombinant protein with a variety of different endogenous proteins including type I collagen, type I laminin, gelatin, pro-uPA and fibrinogen were treated with the recombinant proteases. Among all these potential extracellular substrates, fibrinogen and pro-UPA were clearly degraded by the entire polyserase-3, but not by its serine protease domains produced as independent proteins (Fig. 5C–E, and data not shown). This activity was abolished by preincubating the enzyme with AEBSF, a serine protease inhibitor, but not when the enzyme was treated with inhibitors of other classes of proteases (Fig. 5D). These data provide additional support to the proposal of this enzyme as a catalytically active serine protease. Moreover, SDS-PAGE analysis of the recombinant polyserase-3 incubated for 16 h at 37°C also indicates that the enzyme is released from the GST-moiety (data not shown), which could be due to an autoactivation process of polyserase-3, similarly to fusion proteins containing the catalytic domain of matriptase-1  or matriptase-2 .
Polyserase-3 may form active dimers
Some tryptases, which share several features with polyserase-3, can form active tetramers. Moreover, other members of this group of serine proteases, such as mouse mast cell tryptase , are able to degrade the α-chain of fibrinogen when forming tetramers in a similar manner to that shown herein for polyserase-3. On this basis, we hypothesized that two polyserase-3 molecules could associate to produce a protein structurally equivalent to the tetramers formed by this type of tryptases. To evaluate this question, we produced a recombinant protein containing a 6xHisTag tail at the N-terminus. This new recombinant polyserase-3, purified as described in Experimental Procedures, was incubated in the presence or absence of a reducing agent (2-mercaptoethanol) and detected by Western blot using an anti-HisTag antibody (Fig. 6A). The presence of two immunoreactive bands in native conditions and one band of the expected size in the sample containing the denaturing reagent suggested that polyserase-3 forms dimers which seems to be stabilized by disulfide bridges, as reported for the dog mast cell protease-3 . Interestingly, this recombinant 6xHis tagged polyserase-3 degrades fibrinogen similarly to the GST-polyserase-3 protein (Fig. 6B), suggesting that fibrinogen degradation by the fusion protein, could occur once polyserase-3 is released from the GST.
Analysis of polyserase-3 expression in human tissues
A cDNA probe specific for human polyserase-3 was used to hybridize Northern blots containing poly(A)+ RNAs from a variety of human fetal and adult tissues, and tumor cell lines (Fig. 7). This analysis showed a band of about 7.5 kb in different adult tissues including liver, heart, testis, ovary, intestine, colon and leukocytes. A band of the same size was observed in all analyzed fetal tissues such as kidney, liver, lung and brain. This transcript was also detected in human cancer cell lines, including HeLa (cervix adenocarcinoma), MOLT-4 (lymphoblastic leukaemia), and SW480 (colon adenocarcinoma). Bioinformatic analysis using different programs available at the NIX tool , predicts a transcript of around 8 kb for this gene, suggesting that the higher band observed in Fig. 7, likely corresponds to a full-length polyserase-3 transcript. However, other transcripts of 5.2 kb and 4.2 kb were observed at placenta, testis, HeLa and MOLT-4 cells. The presence of these transcripts of smaller size suggests that the polyserase-3 gene could also be regulated through alternative splicing events which may produce a protein without one of its serine protease domains. This mRNA processing of a multidomain protease has also been described for polyserases -1 and -2 [3,9].