A set of 70 nonredundant Mg2+ proteins was created by searching the PDB  for structures with resolution 2+ with 2+ bound to 2+ dataset comprise 77 binding sites in 70 proteins. Note that although most Mg2+-proteins have only one binding site, some proteins have more than one Mg2+-binding sites (PDB entries 1MXG, 1NUY, 1VCL, 1WL6, 2BJI, and 2BVC). A set of nonredundant Ca2+ proteins was created following the same procedure used to create the Mg2+ dataset. This resulted in 230 Ca2+-binding sites in 177 proteins. The PDB entries, EC code, and amino acid residues bound to the metal ion in the 77 Mg2+ and 230 Ca2+ sites are given in Additional files 1 and 3, respectively.
The Structural Alphabet
Each metalloprotein structure was encoded into its 1D structural sequence according to the original structural alphabet defined by de Brevern and co-workers . We refer the reader to the original work  for details of how this alphabet was devised, and briefly outline the procedure here. The backbone of each protein from a nonredundant protein structure database was represented by consecutive 5-residue segments, each described by a vector of 8 backbone dihedral angles V(ψn-2, φn-1, ψn-1, φn, ψn, φn+1, ψn+1, φn+2). The dissimilarity between 2 vectors V1 and V2 of dihedral angles is measured by the root-mean-square deviations of the dihedral angle values (rmsda), which is defined as the Euclidean distance among the 4 links:
Using an unsupervised cluster analyzer based on the above rmsda of the segments, 16 letters (also called protein blocks) were identified, which in turn comprise the structural 'alphabet'.
Converting 3D Structure to 1D Structural Alphabet
The 3D structures of the 70 Mg2+ and 177 Ca2+ proteins were converted into strings of structural letters using the program PBE published in ref. 9. For a given n- residue protein, n-4 letter assignments were obtained by scanning the protein sequence using a 5-residue sliding window. The structure of each 5-residue segment is compared with that of each of the 16 letters and the letter that has the closest structure (as measured by the rmsda) to the 5-residue segment is assigned to the middle residue of that segment. This process is illustrated in Figure 6: The first letter is assigned to the 3rd residue, Val, representing the first 5-residue segment. Its structure is closest to that of the structural letter 'd', therefore Val 3 is assigned 'd'. Note that no letters can be assigned to the first 2 and last 2 residues of each protein.
Definition of 1st and 2nd-Shell Metal Ligands
Analyses of high-resolution X-ray structures with crystallographic R factor ≤ 0.065 of small metal complexes in the Cambridge Structural Database  have shown that the mean 1st- shell Mg-Owater, Mg-Ocarboxylate, and Mg-Oalcohol distances do not exceed 2.11 Å, while the Ca-Owater, Ca-Ocarboxylate, Ca-Oalcohol, and Ca-Nimidazole bond distances do not exceed 2.55 Å . To account for the lower resolution of the PDB structures, a slightly larger cutoff was used to locate the 1st-shell amino acid ligands. Thus, the Mg2+ and Ca2+ ligands were defined as residues with a donor atom within 2.5 Å and 2.7 Å from the metal ion, respectively. The heavy atoms of the metal-bound amino acid residues were then selected as centers to search for the 2nd-shell protein ligands using a hydrogen-bonding cutoff of 3.5 Å . Note that water molecules in the first and second shells were not identified, as they were not used to define a structural motif.
Definition of 1st and 2nd-Shell Structural Representation/Pattern
Since the 3D structure of each metalloprotein has been converted into the respective 1D letter sequence as described above, the letters that correspond to the metal-bound amino acid residues yielded a structural representation of the first-shell, as shown in the last columns of Additional files 1 and 3 for each metal-binding site. For example, in the case of the human/chicken estrogen receptor (1HCQ), the letters corresponding to the Zn-binding Cys residues at position 7, 10, 24 and 27 are f, o, f, and m, respectively, yielding a f(2)o(13)f(2)m representation of the first-shell for 1HCQ (see Figure 1).
Definition of Structural Motifs
In previous work , all values of k between 2 and 20 were used to define a structural motif, where k is the number of first-shell structural patterns with the same structural letters and similar interletter spacing. Here, k ≥ 3 was used to define a structural motif. Thus, if 3 or more proteins possess first-shell structural patterns with the same structural letters and similar interletter spacing, these proteins are assumed to share a common structural motif. For example, transketolase (1ITZ), pyruvate oxidase (1POX), 2 oxo-acid dehydrogenase alpha subunit (1UMD), pyruvate decarboxylase (1ZPD), and pyruvate-ferredoxin oxidoreductase (2C3M) share the first-shell structural pattern, k(26–29)h(1)a, which thus defines a structural motif.
MD carried out all the calculations, including writing programs, and drafted the manuscript. CL conceived of the study, participated in its design and analysis/interpretation of data, and writing/revising the manuscript. Both authors have read and approved the final manuscript.
We thank anonymous reviewers for constructive comments/suggestions. We are grateful to Steven Wu, Michael J. B. Lin, and Backy Chen for assistance in the statistical analyses, and Leon Li, Todor Dudev, and Gopi Kuppuraj for literature assistance. This work was supported by NSC contract no. NSC 94-2113-M-001-018 to CL.