A protein may live under one or more low free-energy conformational states depending upon its interaction with other proteins. Under a stable conformational state certain regions of the protein are exposed for protein-protein or protein-DNA interactions. Since function is also dependent upon exposed active sites, protein function can be predicted by matching the 3D structure of an unknown protein with the 3D structure of a known protein [10,71]. However, 3D structures from X-ray crystallography and NMR spectroscopy are limited. Thus there is a need for alternate mechanism to match genes. Generally there is close correspondence between gene sequence and 3D structure. In such cases sequence matching is sufficient for function annotation. However, many times multiple sequences map to the same 3D structure; the lack of matching of amino acid sequences does not exclude same 3D structure. In such cases matching 2D structure [57,66] – patterns of alpha helix and beta sheets – and matching 3D structures is needed to verify the function of the newly sequenced protein .
There are two major approaches to model 3D structure of a protein: (i) sequence homology based prediction and (ii) ab initio (or de novo) method. The sequence homology approach uses sequence alignment to identify the best matching 3D structure for different components: conserved portion, loop portion and side chains from the database, and threads them to predict the overall 3D structure. The ab initio method is based upon energy minimization principle, and predicts the structure from the sequence alone . Recent advances in ab initio methods integrate the biochemical and biophysical properties such as folding of beta sheets and the information of hydrophobic regions to achieve better accuracy.
Docking is a term used to identify best matches between 3D structures of two molecules (receptor and ligand) that bind to each other by simulating interacting surfaces and free energy minimization at the domain level . Docking problem requires modeling of surfaces using spheres (or grids) and identifying the best match that will fit two surfaces without excessive intersection. Many times biochemical information such as binding sites is provided. There are three major problems in docking: (i) for multi-domain proteins conformation may change during docking, (ii) docking algorithms have high computational overhead that makes large-scale modeling quite slow, and (iii) docking algorithms suffer from over prediction that results in a high number of false positives.