Making sense of complexity
The goal of proteomics is to provide a mechanistic understanding of the exact proteome and of functional changes associated with heart failure, regardless of the triggering etiology. This understanding, when combined with insights obtained from genomic data, will help to identify overlapping and unique pathways that ultimately render the heart incapable of maintaining cardiac output. Although the exact relation between mRNA and protein quantity is not clear due to the effect of protein processing and turnover, these two processes are ultimately linked.
A collective understanding of the proteins involved in the genesis of heart failure will enable the design of new treatments and clinical interventions. To achieve this goal, additional information about protein localization, function and binding partners must be obtained. Proteomic experiments can reveal changes in protein expression (increase, decrease, de novo and isoform changes) and PTMs that may participate in a particular disease or process. This information will certainly lead to new hypothesis that can be tested experimentally.
The continuous cycling of discovery/proteomic science and hypothesis-driven experiments will eventually lead to a more complete understanding of heart disease. To handle the complex information that must be collected requires the application of bioinformatic techniques – the objective being to make sense of the accumulated data from the proteomic studies, the literature and other experiments carried out on a specific set of samples (review: Winslow and Boguski, in press).
The rapid pace of development of proteomics technologies and the resulting diversity and complexity of proteomics data poses special challenges. In particular, methods for structuring and searching proteomics databases to retrieve groups of proteins based upon well-known pathways, functional classifications and specific PTMs must be developed. Methods for annotating and differentiating PTMs predicted from protein motifs using computational algorithms versus those for which there is direct experimental evidence are required.
Protein concentrations and other measured attributes should be compared with values determined in reference samples to enhance data quantification. With regard to protein identification based on MS, annotations must provide meaningful statistical measures of the quality of match.
Data representation and dissemination must be facilitated by the adoption of standards for data description. As an example of such an effort, the Human Proteome Organization is promoting the development of standard formats for the representation and exchange of MS and protein–protein interaction data and annotations. These formats are derivatives of extensible markup language (XML), a language that originated as a standard for document formatting, but which is now used as a format to transfer structured data of any kind over the World Wide Web.
Finally, Web services, a technology building on the ability of simple object access protocol (SOAP) to support distributed network communication, have great potential as a tool for making both data and computational algorithms transparently available to other software applications. This will facilitate the machine discovery, communication, and analyses of proteomic as well as genomic data.
In addition, cell behaviour is regulated in a complex manner through a diversity of interacting gene expression, signal transduction, metabolic, and electrophysiological pathways. Pathway properties are themselves determined by factors such as the specific nature of molecular interactions, formation of multi-molecular complexes, and by subcellular localization. Representation of information on biological pathways in a form that supports complex querying and modelling is an important goal of post-genomics biology.
Finally, there is currently no single software platform with which data obtained using different proteomic methods (i.e. MS data, ICAT, 2DGE and 2D/LC) can be analysed simultaneously with physiological (i.e. echocardiographs) and genomic data. Clearly there is a need for software tools and database federation systems, which can track and analyse data samples collected using multiple means using statistical methods.
Today, we only have data obtained from different heart failure cohorts done under different circumstances and using different tools. As mentioned earlier, the inherent difficulty in dealing with such disparate data is those experimental limitations and heterogeneity in human samples due to ‘haziness’ around clinical diagnosis and the underlying affect of drugs and treatments.
In the best scenario, myocardial samples from the same large cohort of heart failure patients will be analysed by multiple broad based approaches. Eventually, given sufficient molecular information from a variety of proteomic techniques as well as genomics, will reveal why particular patient subpopulations have a better prognosis than others, or respond differently to therapeutic treatments. In the future, it will be possible to individualize diagnosis and treatment depending on in-depth knowledge of disease processes that produce given disease phenotype.