The database is structured in two closely related sets of tables
that have been populated using different strategies. The first
part related to genes and proteins annotation has been populated
using a pipeline for the collection of relevant information
concerning the components involved in mammalian and budding
yeast cell cycle processes. The second part related to models
has been populated through a system developed for the storage
and the simulation of cell cycle mathematical models developed
in a systems biology context.
Detailed information about gene, protein and model data storedin the cell cycle database is presented in Table 1.
Source data for genes and proteins
We started integrating information relying on data collected
from KEGG (1
) and Reactome (2
). Indeed, these resources, even
if not specific to the cell cycle, represent an important starting
point for information about genes and proteins of the budding
yeast S. cerevisiae
and the H. sapiens
. These species were chosen
since they display evolutionary correlation in the regulatory
mechanism such as the cell cycle (3
The database contains human and yeast genes involved in thecomplete cell cycle pathway (cell growth pathway) and in theMAP kinase signalling pathway (a signal transduction pathwaystrictly related to the cell cycle). Moreover, the cell cycledatabase contains the human genes involved in the apoptosispathway (cell death pathway) taken from KEGG, and it also integratesmore specific information related to mitotic and checkpointpathways from Reactome.
Cell cycle models primary data
The models list has been assembled searching literature andbrowsing many specific online resources. All the models relevantto cell cycle studies have been collected in the database usingan XML file encoded with the Systems Biology Markup Language(SBML) (4). In particular, a number of models, for which theSBML file is available in BIOMODELS (5) or from authors websites,have been directly integrated in the cell cycle database. Publishedmodels not yet implemented in SBML have been manually encodedin SBML using JigCell Model Builder (6). All the SBML filesstored in cell cycle database have been validated through theSystems Biology Workbench (SBW) SBML validator.
Up to now our resource contains 26 models, as reported in Table 1;among them, 13 have the related SBML file and for 12 of themis possible to run simulations.
The cell cycle database has been implemented using a relationaldatabase managed by a MySQL server. A ‘data warehousing’approach has been chosen to develop the resource using a snowflakeschema (7) to organize the data. The ‘data warehousing’approach is used to collect different types of data from externalresources in a unique database system: in this way all datahave the same format making the query system easier and faster.
The cell cycle database system consists in a series of programswritten in Perl used to retrieve the data from several differentexternal databases, to transform and load them into the warehousedata model. This process is possible using a ‘snowflake’schema, a method of storing data in a relational database whichpresents a ‘core table’, where main data about yeastand human genes are stored. The ‘core table’ isconnected to many ‘external tables’, where auxiliarydata about genes, proteins and models are stored. This schemais particularly useful for database updating: when a new entryis inserted in the core table, all the external tables willbe updated ‘in cascade’, while when a new entryis inserted in one of the external table no inward updatingoccurs.
Other resources are essentially linked to our database throughpublic IDs, in order to gain further information and to makethe integration as complete as possible. The list of the integratedand linked resources is shown in Figure 1.