table of contents table of contents

The authors describe the implementation of DataRail, an open source MATLAB-based toolbox …

Home » Biology Articles » Biomathematics » Flexible informatics for linking experimental data to mathematical models via DataRail » Figures

- Flexible informatics for linking experimental data to mathematical models via DataRail

mcith_btn018f1.JPG Figure 1 Process diagram for model-centric information management in DataRail. Measurements generated using one or more methods (left side of diagram) are processed to output new knowledge (right); hypothesis testing links modeling and measurement in an iterative cycle. Processes and entities within the red box have been implemented; those outside the box remain to be completed; dotted lines denote external processes that have been linked to DataRail. Experimental measurements are first converted into a MIDAS format using one or more routines (pink lozenges; see text for details) and then used to assemble a multi-dimensional primary data array (green). Alternatively, an empty MIDAS-compliant spreadsheet is generated using a Java utility and experimental values then entered. Algorithms for normalization, scaling, discretization, etc. transform the data to create new data arrays (orange) that can then be modeled using internal or external routines. Finally, analysis and visualization assist in knowledge generation. The calibration of kinetic and Boolean models is not shown explicitly, although it constitutes a critical and complicated step in the overall workflow of systems biology that is as-yet external to DataRail.

(Click image to enlarge)

mcith_btn018f2.JPG Figure 2 Containment hierarchy for DataRail. Individual arrays of primary or transformed data are gathered together into a MATLAB structure we call a compendium; multiple compendia are linked together into a project. Each compendium contains a unique name (UID), a short textual documentation, and a set of multi-dimensional arrays. Each array is stored together with simple metadata (name, free-text information, source, algorithm, and free parameters used in array creation). The representation follows the conventions of UML (Unified Modeling Language) format, indicating that a compendium contains one or more arrays, which contain one or more labels and zero or more parameters.

(Click image to enlarge)

mcith_btn018f3.JPG Figure 3 Minimum information for data analysis in systems biology (MIDAS). (A) A simplified map of a multi-well experiment in which Akt phosphorylation is to be assayed at 0 and 30 min in extracts from cells treated, or not, with lipo-polysacharide (LPS) and a PI3-kinase inhibitor (PI3Ki). (B) MIDAS representation of the experiment. A column header consists of a two-letter code defining the type of column and a short column name. For clarity headers are color-coded to match the corresponding values on the plate map. The leftmost five columns (codes ID: identity, TR: treatment, and DA: data acquisition) are experimental design parameters and would be filled in before bench work begins. The rightmost column holds measured data values (DV) that are appended as data acquisition is performed. See Supplementary Table S2 for a larger example. (C) A list of the type codes used for MIDAS columns and a few relevant SBWiki types.

(Click image to enlarge)

mcith_btn018f4.JPG Figure 4 Visualizing data in DataRail by exploiting data in transformed arrays. (A) Structure of the compendium used to generate this plot and the relationship of each feature to data in a transformed array. This structural map was generated using routines internal to DataRail. (B) Time courses for the phosphorylation of 17 key proteins (rows) in primary hepatocytes under 11 different conditions of cytokine stimulation (columns) and treated with seven different small molecule drugs (subpanels within each cytokine-signal block). Curves are colored according to their dynamics (green = sustained, yellow = transient, magenta = late activation, grey = no significant signal). The intensity of the signal determines the intensity of the color. The corresponding signals from HepG2 tumor cells are plotted behind without color coding. The background is blue if the mean signal is stronger for primary cells and red if it is stronger for HepG2 cells; larger differences lead to stronger coloring. In addition, the levels of IL8 at 24 h, a measure of cellular response, are added as a heat map.

(Click image to enlarge)

mcith_btn018f5.JPG Figure 5 PLSR analysis in DataRail. Liver CSR data was imported to DataRail and values for protein phosphorylation designated as inputs and levels of secreted cytokine as outputs. The data was not normalized with respect to total protein concentration, to not introduce additional experimental error. The extent of cytokine co-expression was determined using internal PCA and k-means clustering routines. This yielded as set of five tightly clustered cytokines that were used as outputs for modeling (see row 1 of Table S1 for information about the dimensionality of the data). Primary data and data scaled with respect to maximum signal were then analyzed to compute area under the curve, slope, and mean change; this generated 8 transformed arrays for both input and output data. The resulting arrays were rescaled using routines for mean-centering, variance-scaling, or both combined (auto-scaling). The resulting 24 input cubes and 24 output cubes gave rise to 576 PLSR models, which were ranked according to their goodness of fit. For the best model, the variable importance of projection (VIP) is shown as a way to assess the relative importance of different inputs for cytokine secretion.

(Click image to enlarge)


rating: 5.00 from 2 votes | updated on: 3 Nov 2008 | views: 11489 |

Rate article: