Dataset S1. The Tree in PHYLIP Format
“DDB0” is substituted with “k” to satisfy format constraints. While the trees given here and in Figure 1 are topologically identical, they differ by topologically neutral rotations about nodes in few regions.
(5 KB TXT)
Figure S1. Sequence Alignment of Dictyostelium ePK Domains
An alignment of all Dictyostelium ePK kinase domains (except pseudogenes and Chromosome 2 duplicates) is shown. For proteins with two kinase domains, the domains are distinguished with an “a” or “b” suffix. The amino acid position of the first residue in the kinase domain is given. For deletions within the alignment, the number of residues that were removed is indicated in parentheses. The kinases names are shaded by group: yellow, AGC; purple, CAMK; pink, CMGC; blue, STE; green, TKL; orange, CK1; and gray, OTHER. The alignment is shaded to depict regions of similarity on a group-by-group basis. Some kinases from the Other group were shaded with the group they are most closely related to. Subdomain designations correspond to those used in .
(864 KB PDF)
Figure S2. Dictyostelium Text Tree
The tree shown in Figure 1 is shown here in text format with bootstrap values indicated at the nodes. Nodes at which group specific trees were grafted to the main tree are designated by the word “fixed.”
(140 KB PDF)
Figure S3. Domain Drawing of the Dictyostelium Kinases
Matches to PFAM, SMART, and in-house HMMs, polyN and polyQ stretches (common in Dictyostelium proteins), transmembrane helices (TMDs), and signal peptides are shown. Motifs are labeled the first time they appear in each group, and the first time they appear on each page. All proteins are drawn to scale, and the vertical lines represent 100 aa intervals. A brief description of each of the motifs is given in the legend.
(210 KB PDF)
Table S1. Group, Family, and Subfamily Abbreviations
(33 KB XLS)
Table S2. Summary of Dictyostelium Protein Kinases
The protein sequences used in the analyses for this paper are presented. In most instances they correspond to a curated model at dictyBase, and their accession number (DDB#) is given. In cases where our version differs from the current dictyBase model, the relevant DDB# is italicized and appended with a one-letter code. A “p” indicates a pseudogene; these proteins contain internal asterisks to indicate stop codons, and Xs to indicate frameshifts. An “e” indicates an edited gene model (i.e., our interpretation of the genomic data differs from the model presented at dictyBase). A “c” indicates a model that is based on corrected genomic DNA; these corrections were made based on inspection of EST sequence data, genomic reads, or our own sequence data.
A rating for each ePK domain is given in the “Quality of KD” column. The rating reflects the degree of similarity to a canonical ePK, and is largely based on conservation around and including the following motifs: gxGxxg in subdomain I; vaiK in subdomain II; rEi in subdomain III; HRDxxxxN in subdomain VI; DFG in subdomain VII; and Diws in subdomain IX (subdomain nomenclature of ). Sequences containing all of these motifs are rated “kd” (kinase domain); sequences lacking from one or two are rated “partial_kd,” while sequences with at least one clearly recognizable kinase motif but lacking three or more others were designated “kmc” (kinase motif-containing).
In a few cases sequences failing to match three conserved motifs were admitted as partial_kds because of good alignment elsewhere or a good kinase HMM score. Specific rationales for the ratings are given in the “Notes” column. Of the 255 nonpseudogene ePK domains in Dictyostelium (nine proteins have dual ePK domains), 210 are designated “kd,” 37 as “partial_kd,” and three as “kmc.” Five sequences were given special designations, because they are in the BUB, SCY, or SLOB families, which diverge strongly from the ePK consensus, but are well conserved across species.
In the “Predicted Activity” column, “a” (active) and “i” (inactive) refer to the catalytic activity predicted as described in the text. The portions of the alignment used to make these predictions (the VAIK, HRD, and DFG motifs) are shown. If the entry is blank, the ePK domain starts (or ends) after (or before) that motif. If the entry contains only periods, the ePK flanks the domain, but does not match that particular motif. In several cases the activity prediction is qualified because of the lack of conserved residues G52, E91, N171, or D220, as discussed in the text. In these cases “q” is appended to the activity flag and the qualification is described in the “Notes” column. For dual-domain kinases the properties of the individual ePK domains are separated by a slash.
(401 KB XLS)
Table S3. Species Distribution of Protein Kinase Families and Subfamilies
The number of kinases in each group, family, and subfamily in Dictyostelium, yeast, flies, worms, and humans are summarized. The current classification from http://kinase.com is used. Pseudogenes and copies found on the Chromosome 2 duplication are not counted in the Dictyostelium numbers. Unique kinases in each group are not related to kinases from other organisms, and are therefore tabulated under Sections G, H, and I.
(57 KB XLS)