MetaCrop: a detailed database of crop plant metabolism
Abstract
MetaCrop: a detailed database of crop plant metabolism
Eva Grafahrend-Belau,
Stephan Weise,
Dirk Koschützki,
Uwe Scholz,
Björn H. Junker and
Falk Schreiber*
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, D-06466 Gatersleben, Germany
* To whom correspondence should be addressed. Tel: +49 39482 5753; Fax: +49 39482 5407; Email: [email protected]
Received August 15, 2007. Revised September 21, 2007. Accepted September 21, 2007.
Nucleic Acids Research, 2008, Vol. 36, Database issue D954-D958 [Open Access]
Abstract
MetaCrop is a manually curated repository of high quality informationconcerning the metabolism of crop plants. This includes pathwaydiagrams, reactions, locations, transport processes, reactionkinetics, taxonomy and literature. MetaCrop provides detailedinformation on six major crop plants with high agronomical importanceand initial information about several other plants. The webinterface supports an easy exploration of the information fromoverview pathways to single reactions and therefore helps usersto understand the metabolism of crop plants. It also allowsmodel creation and automatic data export for detailed modelsof metabolic pathways therefore supporting systems biology approaches.The MetaCrop database is accessible at http://metacrop.ipk-gatersleben.de.
Introduction
Crop plants are the major source of human nutrition and important
contributors to chemical feedstocks and renewable fuels (
1–3).
An in-depth understanding of the plant's metabolism is helpful
for the improvement of their growth and yield (
4,
5). Data requirements
in metabolic research are quite diverse: while some experts
are interested in a qualitative global view of metabolism, others
need detailed information about single reactions. Additionally,
researchers investigating metabolism often have to rely on databases
with unclear data quality resulting from genome-based metabolic
network predictions. The situation in crop plant research is
furthermore complicated by the fact that only one crop plant
(
Oryza sativa, rice) has been sequenced so far (
6,
7). An example
that requires detailed metabolic information is the generation
of models to quantitatively simulate complex biochemical networks,
an area which is of increasing interest in systems biology.
While repositories for such models exist, the collection of
information necessary for model creation remains a time-consuming
manual task and only very few models for crop plants exist at
all.
Here we present MetaCrop, a database that contains manually
curated, highly detailed information about metabolic pathways
in crop plants, including location information, transport processes
and reaction kinetics. The web interface supports the exploration
of the information from overview pathways to single reactions,
data export and the creation of detailed models of metabolic
pathways. With these features MetaCrop supports crop plant research
in several ways: it improves the understanding of the metabolism,
especially if one wants to get both a general overview and specific
details for selected pathways. It allows the usage of the crop
plant specific information in other tools, for example, to investigate
experimental data in the network context. And it helps in creating
models of metabolic processes for simulation approaches and
in silico experiments.
Database description
Content
MetaCrop contains hand-curated information of about 40 major
metabolic pathways in various crop plants with special emphasis
on the metabolism of agronomically important organs such as
seed and tuber. Species of both monocotyledons and dicotyledons
are represented. Reactions incorporate information about involved
enzymes (e.g. EC and CAS number), metabolites (e.g. CAS number,
molecular weight and chemical formula), stoichiometry and detailed
location (species, organ, tissue, compartment and developmental
stage). Furthermore, for central metabolism (sucrose breakdown,
glycolysis, TCA cycle) kinetic data is available for the reactions.
References and relevant PubMed IDs are given. In order to have
a controlled vocabulary allowing the comparison of data from
different sources ontology terms were used (
8,
9).
Currently the database focuses on the monocotyledon speciesHordeum vulgare (barley), Triticum aestivum (wheat), Oryza sativa(rice), Zea mays (maize) and the dicotyledon species Solanumtuberosum (potato) and Brassica napus (canola). Additional dataof other crop and non-crop plants is currently being added tothe database. In total, about 400 enzymatic reactions, 60 transportprocesses, 5 compartments and 740 references are representedin MetaCrop (see Table 1, content as of July 2007). In orderto enable the export of detailed metabolic networks for systemsbiology approaches, most of the data contained in the databasecorresponds to biochemical data (e.g. taxon-specific enzymaticinformation). In the case of missing biochemical information,proteomic information and genetic information, respectively,is represented for a given enzymatic reaction or transport process.
Web interface
The web interface of the database is accessible at http://metacrop.ipk-gatersleben.de.It allows detailed browsing and searching of data, user feedbackand data export. Figure 1 shows some screenshots of the MetaCropweb interface starting with a complete pathway (sucrose breakdownin dicotyledon species including compartmentalization, transportersand isoenzymes) to detailed information about reaction kinetics.Additionally to searchable data tables, the user is guided byclickable image maps of the pathways. Entire pathways containingall available information on the respective reactions and metabolitescan be downloaded in the standardized systems biology exchangeformat systems biology markup language (SBML) (10), which canbe imported into modelling tools such as COPASI (11,12).
The functionality of the web interface is documented in a tutorialavailable on the website. It is also possible to edit entries,extend the content of MetaCrop and create user-specific models.To ensure data quality, such changes cannot be done anonymously.Users interested in these functionalities are invited to obtainan editing account for MetaCrop. Changes performed by all accountsare logged and checked by curators to guarantee consistencyand quality of the inserted data. The web interface is basedon the Oracle Application Express technology.
Database implementation
MetaCrop uses the information system Meta-All (13) and is basedon the database management system Oracle. The database schemacomprises 51 relational tables and can be divided into severalparts. The main parts are conversions, substances, pathways,locations, references and versioning. Conversions and substancesare the central parts of the schema. A conversion is a reactionor a translocation, which is either active or passive. Substancescomprise transporters, enzymes, metabolites and macromolecules.They take place in conversions and play certain roles, suchas reactant or product, modulator, catalyst, etc. All necessaryinformation, e.g. name, formula or kinetic data, can be storedtogether with conversions and substances. In order to distinguishdata originating from different publications, each record canbe enriched by reference information. The term location describesa combination of taxonomy, developmental stage and cytologyof plants in order to distinguish where and when conversionstake place. Therefore, controlled vocabulary is used. Additionally,the database schema supports parallel versioning of data records,e.g. in case of different opinions of experimentalists. Finally,pathways are combinations of conversions taking place at a certainlocation.
The complete information represented in MetaCrop is also availableas a dump of the database, i.e. the data is available for bulkdownload. The dump can easily be imported into a user's instanceof the open source information system Meta-All (13), thereforeenabling users to run their local version of the database.
Curation, quality assurance, completeness and continuation
All information was extracted manually through an extensive
survey of primary literature and online databases. Literature-based
information was derived from about 800 papers of plant biochemical
and physiological journals as well as from respective textbooks
(e.g. (
14,
15)). Furthermore, some of the information was manually
extracted from online databases providing pathway-related information:
KEGG PATHWAY ((
16), http://www.genome.jp/kegg/pathway.html),
EGENES ((
17), http://www.genome.jp/kegg-bin/create_kegg_menu?category=plants_egenes),
AraCyc ((
18), http://www.arabidopsis.org/biocyc/index.jsp),
MetaCyc ((
19), http://metacyc.org/), RiceCyc (http://www.gramene.org/pathway/),
Reactome ((
20), http://www.reactome.org/); enzyme-related information:
BRENDA ((
21), http://www.brenda-enzymes.info/), ExPASy-ENZYME
((
22), http://expasy.org/enzyme/); protein-related information:
Swiss-Prot/TrEMBL ((
23), http://www.expasy.org/sprot/); metabolite-related
information: PubChem (http://pubchem.ncbi), KEGG LIGAND ((
16a),
http://www.genome.jp/kegg/ligand.html); transporter-related
information: ARAMEMNON ((
24), http://aramemnon.botanik.uni-koeln.de/);
kinetic information: BRENDA ((
21), http://www.brenda-enzymes.info/)).
For quality assurance information inferred from databases has
been checked against literature. To enable the trace back of
information and further reading, references and corresponding
PubMed IDs are given where available. Controlled vocabulary
(e.g. ontology terms from Plant Ontology (
8) and Gene Ontology
(
9)) was used to ensure consistency and to allow the comparison
of data from different sources. Currently MetaCrop contains
most of the pathways of central metabolism in higher plants
(e.g. metabolism of carbohydrates, amino acids, lipids, energy,
cofactors and nucleotides). With respect to crop plant metabolism,
special emphasis is laid on pathways of seed and tuber metabolism
such as the sucrose breakdown pathway. While our current focus
is on updating pathways with incomplete information, we plan
to extend the information stored in MetaCrop to pathways of
plant secondary metabolism. The extension of MetaCrop is primary
done inhouse; however, registered users can edit entries and
extend the content of MetaCrop and therefore may in the future
also contribute to the extension of the database.
Application of the metacrop database
MetaCrop can be used for a wide variety of applications in crop
plant research. It helps in understanding the metabolism at
different levels of detail, it allows the use of crop plant
specific information in other tools for further investigations,
and it supports the creation of models of metabolism for simulation
approaches. Two example applications are as follows:
Mathematical analysis of metabolic pathways
The in-depth mathematical analysis of a pathway of interestwill generally consist of two main steps, which are (i) investigationof the structural properties and capabilities of the pathwaywith tools such as CellNetAnalyzer (25) and (ii) detailed analysisof the kinetic characteristics of the system with modellingand simulation tools such as COPASI (11). MetaCrop supportsthese processes at various steps. It contains all necessaryinformation for structural pathway analysis, and for centralmetabolism also detailed kinetic data for kinetic pathway analysis.Furthermore, the above-mentioned tools are able to read thefiles exported from MetaCrop in the standardized SBML format(10). Once imported into these tools, the pathways can serveas a starting point for structural or kinetic metabolic models.
Investigation of -omics data in the context of metabolic networks
Network-related analysis of high-throughput data involves the
mapping of experimental data onto related pathways and the investigation
of this integrated data. Such functionality is provided by tools
such as VANTED (
26), a system for the visualization and analysis
of networks with related experimental data. Data from large-scale
biochemical experiments can be uploaded into the software and
then mapped on a network that is either drawn with the tool
itself or imported, for example, from a SBML file. VANTED enables
users to present and analyse transcript, enzyme, proteomics
and metabolite data in the context of underlying networks such
as metabolic pathways from MetaCrop. Several analysis methods
implemented in such software systems help in further investigation
of the data.
Discussion
MetaCrop contains comprehensive, original, high-quality data
about crop plant metabolism. While most of the existing metabolic
pathway databases do not contain any plant-specific information,
there exist a few multi-organism databases such as MetaCyc (
19),
BRENDA (
21) and KEGG (
16) comprising information about plant
metabolism. The transcriptome-based database EGENES (
17) is
a multi-species plant database, which currently consists of
25 plant species (release 41.0, January 2007). The database
integrates plant genomic information (EST contigs) and pathway
information (pathway maps derived from KEGG reference pathways),
thus offering an overview of fundamental biological processes
in plants. In addition to these multi-species databases, there
exist a few species-specific crop plant databases such as the
pathway/genome databases RiceCyc (
http://www.gramene.org/pathway/)
and SolCyc (
http://www.sgn.cornell.edu/tools/solcyc/). However,
most of these single- and multi-species databases only contain
little or no hand-curated information due to genome- or EST-based
pathway predictions or do not support model creation and model
export in SBML. Furthermore, highly specific information such
as kinetic data, compartment-specific information or transport
processes are often lacking and most of the databases are limited
to read-only access not allowing for user-specific interaction,
editing and extending.
Similar to MetaCrop the pathway databases AraCyc ((
18),
http://www.arabidopsis.org/biocyc/index.jsp)
and MetNetDB ((
27),
http://www.metnetdb.org) contain detailed
information about plant metabolism. AraCyc is a pathway/genome
database that contains enzymes and pathways found in the model
plant
Arabidopsis (
Arabidopsis thaliana). The Metabolic Networking
Data Base (MetNetDB) contains information on metabolic and regulatory
networks in
Arabidopsis, which are derived from a combination
of online databases and input from biologists in their area
of expertise. Both databases are under continued curation and
contain highly specific information such as compartment-specific
information or transport processes. However, both databases
currently only contain information about the model plant
Arabidopsis.
Conclusion
MetaCrop is an ongoing project and currently consists largelyof a collection of manually curated data about six major cropplants, interactive interaction methods via the web interfaceand export functionalities. Our vision for the database is intwo directions: the further curation of information and theimprovement of the web interface. We plan to extend the informationstored in MetaCrop to secondary pathways and to include otherimportant crop plants such as Glycin max (soybean), Solanumlycopersicum (tomato), Helianthus annuus (sunflower) and Secalecereale (rye). For the web interface work is underway to implementmethods to take advantage of the taxonomy and localization informationin MetaCrop such that, for example, if information is not availablefor a specific species it can be derived from information ofclosely related species.
Acknowledgment
This work was partly supported by the German Federal Ministry
of Education and Research (grant 0312706A). Funding to pay the
Open Access publication charges for this article was provided
by the Leibniz Institute of Plant Genetics and Crop Plant Research
(IPK) Gatersleben.
Conflict of interest statement. None declared.
References
- Grusak MA, DellaPenna D. Improving the nutrient
composition of plants to enhance human nutrition and health. Annu. Rev.
Plant Physiol. Plant Mol. Biol. (1999) 50:133–161
-
Metzger JO, Bornscheuer U. Lipids as renewable resources: current state
of chemical and biotechnological conversion and diversification. Appl.
Microbiol. Biotechnol. (2006) 71:13–22
-
Tilman D, Hill J, Lehman C. Carbon-negative biofuels from low-input
high-diversity grassland biomass. Science (2006) 314:1598–1600
- Jenner HL. Transgenesis and yield: what are our targets? Trends Biotechnol. (2003) 21:190–192.
-
Carrari F, Urbanczyk-Wochniak E, Willmitzer L, Fernie AR. Engineering
central metabolism in crop species: learning the system. Metab. Eng.
(2003) 5:191–200
-
Yu J, Hu S, Wang J, Wong G.K.-S, Li S, Liu B, Deng Y, Dai L, Zhou Y, et
al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica).
Science (2002) 296:79–92.
-
Goff SA, Ricke D, Lan T.-H, Presting G, Wang R, Dunn M, Glazebrook J,
Sessions A, Oeller P, et al. A draft sequence of the rice genome (Oryza
sativa L. ssp. japonica). Science (2002) 296:92–100
-
Jaiswal P, Avraham S, Ilic K, Kellogg EA, McCouch S, Pujar A, Reiser L,
Rhee SY, Sachs MM, et al. Plant Ontology (PO): a controlled vocabulary
of plant structures and growth stages. Comp. Funct. Genomics (2005)
6:388–397
- Gene Ontology Consertium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. (2006) 34(Suppl. 1):D322–D326
-
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP,
Bornstein BJ, Bray D, et al. The systems biology markup language
(SBML): a medium for representation and exchange of biochemical network
models. Bioinformatics (2003) 19:524–531
-
Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L,
Mendes PU, et al. COPASI—a complex pathway simulator. Bioinformatics
(2006) 22:3067–3074
- Alves R, Antunes F, Salvador A. Tools for kinetic modeling of biochemical networks. Nat. Biotechnol. (2006) 24:667–672
-
Weise S, Grosse I, Klukas C, Koschützki D, Scholz U, Schreiber F,
Junker BH. Meta-All: a system for managing metabolic pathway
information. BMC Bioinformatics (2006) 7:e465
-
Buchanan BB, Gruissem W, Russel LJ. Biochemistry & Molecular
Biology of Plants. (2000) Rockeville, MD: American Society of Plant
Physiologists.
- Bewley JD, Black M. Seeds: Physiology of Development and Germination (1994) 2nd. New York, USA: Plenum Press.
-
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S,
Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics:
new developments in KEGG. Nucleic Acids Res. (2006) 34(Suppl.
1):D354–D357
-
Masoudi-Nejad A, Goto S, Jauregui R, Ito M, Kawashima S, Moriya Y, Endo
T, Kanehisa M. EGENES: transcriptome-based plant database of genes with
metabolic pathway information and expressed sequence tag indices in
KEGG. Plant Physiol. (2007) 144:857–866
-
Rhee SY, Zhang P, Foerster H, Tissier C. AraCyc: overview of an
Arabidopsis metabolism database and its applications for plant
research. In: Plant Metabolomics.—Saito K, Dixon RA, Willmitzer L, eds.
(2006) Heidelberg: Springer Berlin. 141–154.
-
Caspi R, Foerster H, Fulcher C, Hopkinson R, Ingraham J, Kaipa P,
Krummenacker M, Paley S, Pick J, et al. MetaCyc: a multiorganism
database of metabolic pathways and enzymes. Nucleic Acids Res. (2006)
34(Suppl. 1):D511–D516
-
Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D,
de Bono B, Gillespie M, Jassal B, et al. Reactome: a knowledge base of
biologic pathways and processes. Genome Biol. (2007) 8:R39
-
Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D. BRENDA,
AMENDA and FRENDA: the enzyme information system in 2007. Nucleic Acids
Res. (2007) 35(Suppl. 1):D511–D514
- Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. (2000) 28:304–305.
-
Boeckmann B, Bairoch A, Apweiler R, Blatter M.-C, Estreicher A,
Gasteiger E, Martin MJ, Michoud K, O’Donovan C, et al. The SWISS-PROT
protein knowledgebase and its supplement trEMBL in 2003. Nucleic Acids
Res. (2003) 31:365–370.
-
Schwacke R, Schneider A, van der Graaff E, Fischer K, Catoni E,
Desimone M, Frommer WB, Flügge U.-I, Kunze R. ARAMEMNON: a novel
database for Arabidopsis integral membrane proteins. Plant Physiol.
(2003) 131:16–26
-
Klamt S, Saez-Rodriguez J, Gilles ED. Structural and functional
analysis of cellular networks with CellNetAnalyzer. BMC Syst. Biol.
(2007) 1:e2.
-
Junker BH, Klukas C, Schreiber F. VANTED: a system for advanced data
analysis and visualization in the context of biological networks. BMC
Bioinformatics (2006) 7:e109.
-
Wurtele ES, Li J, Diao L, Zhang H, Foster CM, Fatland B, Dickerson J,
Brown A, Cox Z, et al. MetNet: software to build and model the
biogenetic lattice of Arabidopsis. Comp. Funct. Genomics (2003)
4:239–245.
Table 1
Table 1. Information contained in MetaCrop
| |
Hordeum vulgare |
Triticum aestivum |
Oryza sativa |
Zea mays |
Solanum tuberosum |
Brassica napus |
Totala |
|
| Pathways |
36 |
33 |
34 |
34 |
34 |
26 |
38 |
| Enzymatic reactions |
291 |
271 |
278 |
273 |
207 |
168 |
392 |
| Transport processes |
7 |
6 |
9 |
27 |
14 |
7 |
59 |
| Compartments |
4 |
4 |
4 |
3 |
3 |
3 |
5 |
| References |
382 |
347 |
340 |
346 |
252 |
204 |
734 |
|
aIncluding other plants; pathways, reactions and other information occurring in more than one plant are only listed once.
Figure
|
Figure 1
Screenshots of the web interface of MetaCrop. (a) A pathway (sucrose breakdown in dicotyledon species, which shows compartmentalization, transporter and isoenzymes); (b)
Information connected to pathways: conversion details (cytosolic
phosphoglucose isomerase): stoichiometry, catalyst, metabolites,
conversion location, subset of taxon-specific kinetic parameters (vmax,
km) given for cytosolic phosphoglucose isomerase.
(Click image to enlarge)
|
http://www.biology-online.org/articles/metacrop-detailed-database-crop-plant.html