The genomics and proteomics research front has progressively moved from metabolic pathway reconstruction to the identification of signaling pathways and promoter analysis to identify transcription factors for protein-DNA interactions.
There are four major approaches to study protein-DNA interactions: (i) micro-array analysis of gene-expressions under different stress conditions of cells, (ii) statistical analysis of promoter regions of orthologous genes (functionally equivalent genes in different organisms identified as best homologs), (iii) global analysis of frequency patterns of dimers in the intergenic region – promoter region occurring between adjacent protein coding regions – of a genome, and (iv) biochemical modeling at the atomic bond level to understand how a protein will bind to nucleotides. Only the microarray analysis technique is based upon experimental data, and other three approaches are based on mathematical modeling and sequence analysis.
Micro array analysis  measures the relative change in the gene-expressions for a stressed (or a stimulated) cell and a change in cellular expression pattern – differentiation, cellular cycle, tissue remodeling, sporulation etc – in response to change in stimuli using a two step process: (i) mapping all the genes in the same genome etched on a thin glass plate and hybridizing the genes of a healthy cell with etched genes to derive the regular gene expression under equilibrium condition, and (ii) hybridizing the affected cells with etched genes to derive the gene expression of affected cells under equilibrium condition. Comparative study of gene expressions under normal condition and under a stimulated (or stressed) condition provides the information about the affected genes. Under the assumption that auto regulation in a gene-group and any cyclic self-regulation is absent, the interaction between protein and transcription factors is responsible for the observed increase or decrease of gene-expressions. This gene-expression data is analyzed using (i) cluster analysis  to identify meaningful patterns of gene-expressions, or (ii) data mining techniques – a statistical technique that associates and correlates expressed genes and different stress conditions.
The second approach of statistical promoter analysis [30,43,44] first identifies the orthologous genes from evolutionary close microorganisms  with active pathways using pair-wise genome comparisons databases (see http://www.cs.kent.edu/~arvind/intellibio/orthos.html) or using the knowledge of cluster of orthologs (COGS) – a group of genes in a super family archived at NCBI at NIH that has been derived by multiple genome comparisons. In the next step, the upstream region between two genes of the orthologs are identified and compared to identify statistically conserved patterns. Under the assumption that functionally equivalent genes in the very similar pathways of evolutionary close organisms will have similar regulation mechanism, the transcription factors – regions of promoters involved in enhancing or repressing the gene-expression of the associated gene – for protein-DNA interaction in the promoters of orthologous genes would also be very similar. This analysis has led to discovery of many transcription factors.
The third approach  has been to extract and statistically analyze the dimers in the intergenic region in a whole genome and plot the frequency of occurrence. The non-random dimers that occur more frequently are possibly involved in protein-DNA interactions.
The biochemical approach  studies the protein-DNA interactions at the atomic bond level by considering hydrogen bonds in amino-acid base interactions, Van der Wall forces at contacts and water mediated bonds at different levels of proximity of two molecules. Based upon the analysis of the bonds and the actual statistical results, it has been concluded that amino-acid base interaction plays a major role in binding, Van der Wall forces provide stabilization, and protein-DNA interactions are complex and biased: different amino-acids have preferences for certain types of bases. For example, arginine, lysine, histidine and serine have preference for guanine.
Currently no researcher has attempted a hybrid approach integrating biochemical approach with other four approaches. An integrated approach will give a better overall picture. Another complex problem is that a co-regulated gene may have more than one transcription factor; some of these transcription factors may be individually weak and may be correlated with other transcription factors. An approach to identify the weak transcription factor is a two step process: (i) first identify the strong related transcription factor using one of the previous approaches followed by (ii) a pattern search in the neighborhood of the strong pattern .
Figuring out the connectivity in protein-protein interactions to derive signaling pathway has been a long drawn challenge. Recently, in last two years, two approaches have emerged: (1) integration of microarray analysis and entropy based modeling to derive gene clustering of the genes involved in the same regulatory pathway [2,7], and (2) technique based upon random algorithms maximizing transition probability. The first approach computes the mutual information of all the gene-pairs, and clusters the protein groups having more mutual information above a threshold . The mutual information is entropy based approach, and is derived by the cumulative sum of the frequency patterns of occurrence of gene-pairs. To derive entropy, gene-expressions are divided into discrete histograms, and the mutual information between every gene-pair is computed . Higher mutual information means direct correlation of the genes. It has been statistically found that genes that belong to the same pathway tend to group together. Using this cluster analysis, many signaling pathways have been identified in yeast-based system . The analysis is a general-purpose technique, and can be used both in prokaryotic as well as eukaryotic systems.
Even figuring out the connectivity will not be able to answer the transient temporal behavior of many genes involved in the regulation mechanism and auto-regulation mechanism of operons – co-transcribed gene-group within a pathway involved in a common functionality. The modeling of transient behavior of genes cannot be captured by hybridization based microarray analysis since the data corresponds to equilibrium state of reactions. To understand the malfunctioning cells and cells of pathogenic bacterial strains, the overall organization and behavior including transient behavior and stress responses have to be studied.