MetaCrop: a detailed database of crop plant metabolism

Abstract

MetaCrop: a detailed database of crop plant metabolism

Eva Grafahrend-Belau, Stephan Weise, Dirk Koschützki, Uwe Scholz, Björn H. Junker and Falk Schreiber*

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, D-06466 Gatersleben, Germany

* To whom correspondence should be addressed. Tel: +49 39482 5753; Fax: +49 39482 5407; Email: [email protected]

Received August 15, 2007. Revised September 21, 2007. Accepted September 21, 2007.

Nucleic Acids Research, 2008, Vol. 36, Database issue D954-D958 [Open Access]

Abstract

MetaCrop is a manually curated repository of high quality informationconcerning the metabolism of crop plants. This includes pathwaydiagrams, reactions, locations, transport processes, reactionkinetics, taxonomy and literature. MetaCrop provides detailedinformation on six major crop plants with high agronomical importanceand initial information about several other plants. The webinterface supports an easy exploration of the information fromoverview pathways to single reactions and therefore helps usersto understand the metabolism of crop plants. It also allowsmodel creation and automatic data export for detailed modelsof metabolic pathways therefore supporting systems biology approaches.The MetaCrop database is accessible at http://metacrop.ipk-gatersleben.de.


Introduction

Crop plants are the major source of human nutrition and importantcontributors to chemical feedstocks and renewable fuels (1–3).An in-depth understanding of the plant's metabolism is helpfulfor the improvement of their growth and yield (4,5). Data requirementsin metabolic research are quite diverse: while some expertsare interested in a qualitative global view of metabolism, othersneed detailed information about single reactions. Additionally,researchers investigating metabolism often have to rely on databaseswith unclear data quality resulting from genome-based metabolicnetwork predictions. The situation in crop plant research isfurthermore complicated by the fact that only one crop plant(Oryza sativa, rice) has been sequenced so far (6,7). An examplethat requires detailed metabolic information is the generationof models to quantitatively simulate complex biochemical networks,an area which is of increasing interest in systems biology.While repositories for such models exist, the collection ofinformation necessary for model creation remains a time-consumingmanual task and only very few models for crop plants exist atall. Here we present MetaCrop, a database that contains manuallycurated, highly detailed information about metabolic pathwaysin crop plants, including location information, transport processesand reaction kinetics. The web interface supports the explorationof the information from overview pathways to single reactions,data export and the creation of detailed models of metabolicpathways. With these features MetaCrop supports crop plant researchin several ways: it improves the understanding of the metabolism,especially if one wants to get both a general overview and specificdetails for selected pathways. It allows the usage of the cropplant specific information in other tools, for example, to investigateexperimental data in the network context. And it helps in creatingmodels of metabolic processes for simulation approaches andin silico experiments.

Database description

Content
MetaCrop contains hand-curated information of about 40 majormetabolic pathways in various crop plants with special emphasison the metabolism of agronomically important organs such asseed and tuber. Species of both monocotyledons and dicotyledonsare represented. Reactions incorporate information about involvedenzymes (e.g. EC and CAS number), metabolites (e.g. CAS number,molecular weight and chemical formula), stoichiometry and detailedlocation (species, organ, tissue, compartment and developmentalstage). Furthermore, for central metabolism (sucrose breakdown,glycolysis, TCA cycle) kinetic data is available for the reactions.References and relevant PubMed IDs are given. In order to havea controlled vocabulary allowing the comparison of data fromdifferent sources ontology terms were used (8,9).

Currently the database focuses on the monocotyledon speciesHordeum vulgare (barley), Triticum aestivum (wheat), Oryza sativa(rice), Zea mays (maize) and the dicotyledon species Solanumtuberosum (potato) and Brassica napus (canola). Additional dataof other crop and non-crop plants is currently being added tothe database. In total, about 400 enzymatic reactions, 60 transportprocesses, 5 compartments and 740 references are representedin MetaCrop (see Table 1, content as of July 2007). In orderto enable the export of detailed metabolic networks for systemsbiology approaches, most of the data contained in the databasecorresponds to biochemical data (e.g. taxon-specific enzymaticinformation). In the case of missing biochemical information,proteomic information and genetic information, respectively,is represented for a given enzymatic reaction or transport process.

Web interface
The web interface of the database is accessible at http://metacrop.ipk-gatersleben.de.It allows detailed browsing and searching of data, user feedbackand data export. Figure 1 shows some screenshots of the MetaCropweb interface starting with a complete pathway (sucrose breakdownin dicotyledon species including compartmentalization, transportersand isoenzymes) to detailed information about reaction kinetics.Additionally to searchable data tables, the user is guided byclickable image maps of the pathways. Entire pathways containingall available information on the respective reactions and metabolitescan be downloaded in the standardized systems biology exchangeformat systems biology markup language (SBML) (10), which canbe imported into modelling tools such as COPASI (11,12).

The functionality of the web interface is documented in a tutorialavailable on the website. It is also possible to edit entries,extend the content of MetaCrop and create user-specific models.To ensure data quality, such changes cannot be done anonymously.Users interested in these functionalities are invited to obtainan editing account for MetaCrop. Changes performed by all accountsare logged and checked by curators to guarantee consistencyand quality of the inserted data. The web interface is basedon the Oracle Application Express technology.

Database implementation
MetaCrop uses the information system Meta-All (13) and is basedon the database management system Oracle. The database schemacomprises 51 relational tables and can be divided into severalparts. The main parts are conversions, substances, pathways,locations, references and versioning. Conversions and substancesare the central parts of the schema. A conversion is a reactionor a translocation, which is either active or passive. Substancescomprise transporters, enzymes, metabolites and macromolecules.They take place in conversions and play certain roles, suchas reactant or product, modulator, catalyst, etc. All necessaryinformation, e.g. name, formula or kinetic data, can be storedtogether with conversions and substances. In order to distinguishdata originating from different publications, each record canbe enriched by reference information. The term location describesa combination of taxonomy, developmental stage and cytologyof plants in order to distinguish where and when conversionstake place. Therefore, controlled vocabulary is used. Additionally,the database schema supports parallel versioning of data records,e.g. in case of different opinions of experimentalists. Finally,pathways are combinations of conversions taking place at a certainlocation.

The complete information represented in MetaCrop is also availableas a dump of the database, i.e. the data is available for bulkdownload. The dump can easily be imported into a user's instanceof the open source information system Meta-All (13), thereforeenabling users to run their local version of the database.

 

 

 


Curation, quality assurance, completeness and continuation

All information was extracted manually through an extensivesurvey of primary literature and online databases. Literature-basedinformation was derived from about 800 papers of plant biochemicaland physiological journals as well as from respective textbooks(e.g. (14,15)). Furthermore, some of the information was manuallyextracted from online databases providing pathway-related information:KEGG PATHWAY ((16), http://www.genome.jp/kegg/pathway.html),EGENES ((17), http://www.genome.jp/kegg-bin/create_kegg_menu?category=plants_egenes),AraCyc ((18), http://www.arabidopsis.org/biocyc/index.jsp),MetaCyc ((19), http://metacyc.org/), RiceCyc (http://www.gramene.org/pathway/),Reactome ((20), http://www.reactome.org/); enzyme-related information:BRENDA ((21), http://www.brenda-enzymes.info/), ExPASy-ENZYME((22), http://expasy.org/enzyme/); protein-related information:Swiss-Prot/TrEMBL ((23), http://www.expasy.org/sprot/); metabolite-relatedinformation: PubChem (http://pubchem.ncbi), KEGG LIGAND ((16a),http://www.genome.jp/kegg/ligand.html); transporter-relatedinformation: ARAMEMNON ((24), http://aramemnon.botanik.uni-koeln.de/);kinetic information: BRENDA ((21), http://www.brenda-enzymes.info/)). For quality assurance information inferred from databases hasbeen checked against literature. To enable the trace back ofinformation and further reading, references and correspondingPubMed IDs are given where available. Controlled vocabulary(e.g. ontology terms from Plant Ontology (8) and Gene Ontology(9)) was used to ensure consistency and to allow the comparisonof data from different sources. Currently MetaCrop containsmost of the pathways of central metabolism in higher plants(e.g. metabolism of carbohydrates, amino acids, lipids, energy,cofactors and nucleotides). With respect to crop plant metabolism,special emphasis is laid on pathways of seed and tuber metabolismsuch as the sucrose breakdown pathway. While our current focusis on updating pathways with incomplete information, we planto extend the information stored in MetaCrop to pathways ofplant secondary metabolism. The extension of MetaCrop is primarydone inhouse; however, registered users can edit entries andextend the content of MetaCrop and therefore may in the futurealso contribute to the extension of the database.

Application of the metacrop database

MetaCrop can be used for a wide variety of applications in cropplant research. It helps in understanding the metabolism atdifferent levels of detail, it allows the use of crop plantspecific information in other tools for further investigations,and it supports the creation of models of metabolism for simulationapproaches. Two example applications are as follows:

Mathematical analysis of metabolic pathways
The in-depth mathematical analysis of a pathway of interestwill generally consist of two main steps, which are (i) investigationof the structural properties and capabilities of the pathwaywith tools such as CellNetAnalyzer (25) and (ii) detailed analysisof the kinetic characteristics of the system with modellingand simulation tools such as COPASI (11). MetaCrop supportsthese processes at various steps. It contains all necessaryinformation for structural pathway analysis, and for centralmetabolism also detailed kinetic data for kinetic pathway analysis.Furthermore, the above-mentioned tools are able to read thefiles exported from MetaCrop in the standardized SBML format(10). Once imported into these tools, the pathways can serveas a starting point for structural or kinetic metabolic models.

Investigation of -omics data in the context of metabolic networks
Network-related analysis of high-throughput data involves themapping of experimental data onto related pathways and the investigationof this integrated data. Such functionality is provided by toolssuch as VANTED (26), a system for the visualization and analysisof networks with related experimental data. Data from large-scalebiochemical experiments can be uploaded into the software andthen mapped on a network that is either drawn with the toolitself or imported, for example, from a SBML file. VANTED enablesusers to present and analyse transcript, enzyme, proteomicsand metabolite data in the context of underlying networks suchas metabolic pathways from MetaCrop. Several analysis methodsimplemented in such software systems help in further investigationof the data.

Discussion

MetaCrop contains comprehensive, original, high-quality dataabout crop plant metabolism. While most of the existing metabolicpathway databases do not contain any plant-specific information,there exist a few multi-organism databases such as MetaCyc (19),BRENDA (21) and KEGG (16) comprising information about plantmetabolism. The transcriptome-based database EGENES (17) isa multi-species plant database, which currently consists of25 plant species (release 41.0, January 2007). The databaseintegrates plant genomic information (EST contigs) and pathwayinformation (pathway maps derived from KEGG reference pathways),thus offering an overview of fundamental biological processesin plants. In addition to these multi-species databases, thereexist a few species-specific crop plant databases such as thepathway/genome databases RiceCyc (http://www.gramene.org/pathway/)and SolCyc (http://www.sgn.cornell.edu/tools/solcyc/). However,most of these single- and multi-species databases only containlittle or no hand-curated information due to genome- or EST-basedpathway predictions or do not support model creation and modelexport in SBML. Furthermore, highly specific information suchas kinetic data, compartment-specific information or transportprocesses are often lacking and most of the databases are limitedto read-only access not allowing for user-specific interaction,editing and extending. Similar to MetaCrop the pathway databases AraCyc ((18), http://www.arabidopsis.org/biocyc/index.jsp)and MetNetDB ((27), http://www.metnetdb.org) contain detailedinformation about plant metabolism. AraCyc is a pathway/genomedatabase that contains enzymes and pathways found in the modelplant Arabidopsis (Arabidopsis thaliana). The Metabolic NetworkingData Base (MetNetDB) contains information on metabolic and regulatorynetworks in Arabidopsis, which are derived from a combinationof online databases and input from biologists in their areaof expertise. Both databases are under continued curation andcontain highly specific information such as compartment-specificinformation or transport processes. However, both databasescurrently only contain information about the model plant Arabidopsis.

Conclusion

MetaCrop is an ongoing project and currently consists largelyof a collection of manually curated data about six major cropplants, interactive interaction methods via the web interfaceand export functionalities. Our vision for the database is intwo directions: the further curation of information and theimprovement of the web interface. We plan to extend the informationstored in MetaCrop to secondary pathways and to include otherimportant crop plants such as Glycin max (soybean), Solanumlycopersicum (tomato), Helianthus annuus (sunflower) and Secalecereale (rye). For the web interface work is underway to implementmethods to take advantage of the taxonomy and localization informationin MetaCrop such that, for example, if information is not availablefor a specific species it can be derived from information ofclosely related species.

 

 


Acknowledgment

This work was partly supported by the German Federal Ministryof Education and Research (grant 0312706A). Funding to pay theOpen Access publication charges for this article was providedby the Leibniz Institute of Plant Genetics and Crop Plant Research(IPK) Gatersleben.Conflict of interest statement. None declared.

References

  1. Grusak MA, DellaPenna D. Improving the nutrient composition of plants to enhance human nutrition and health. Annu. Rev. Plant Physiol. Plant Mol. Biol. (1999) 50:133–161

  2. Metzger JO, Bornscheuer U. Lipids as renewable resources: current state of chemical and biotechnological conversion and diversification. Appl. Microbiol. Biotechnol. (2006) 71:13–22

  3. Tilman D, Hill J, Lehman C. Carbon-negative biofuels from low-input high-diversity grassland biomass. Science (2006) 314:1598–1600

  4. Jenner HL. Transgenesis and yield: what are our targets? Trends Biotechnol. (2003) 21:190–192.

  5. Carrari F, Urbanczyk-Wochniak E, Willmitzer L, Fernie AR. Engineering central metabolism in crop species: learning the system. Metab. Eng. (2003) 5:191–200

  6. Yu J, Hu S, Wang J, Wong G.K.-S, Li S, Liu B, Deng Y, Dai L, Zhou Y, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science (2002) 296:79–92.

  7. Goff SA, Ricke D, Lan T.-H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science (2002) 296:92–100

  8. Jaiswal P, Avraham S, Ilic K, Kellogg EA, McCouch S, Pujar A, Reiser L, Rhee SY, Sachs MM, et al. Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp. Funct. Genomics (2005) 6:388–397

  9. Gene Ontology Consertium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. (2006) 34(Suppl. 1):D322–D326

  10. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics (2003) 19:524–531

  11. Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes PU, et al. COPASI—a complex pathway simulator. Bioinformatics (2006) 22:3067–3074

  12. Alves R, Antunes F, Salvador A. Tools for kinetic modeling of biochemical networks. Nat. Biotechnol. (2006) 24:667–672

  13. Weise S, Grosse I, Klukas C, Koschützki D, Scholz U, Schreiber F, Junker BH. Meta-All: a system for managing metabolic pathway information. BMC Bioinformatics (2006) 7:e465

  14. Buchanan BB, Gruissem W, Russel LJ. Biochemistry & Molecular Biology of Plants. (2000) Rockeville, MD: American Society of Plant Physiologists.

  15. Bewley JD, Black M. Seeds: Physiology of Development and Germination (1994) 2nd. New York, USA: Plenum Press.

  16. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. (2006) 34(Suppl. 1):D354–D357

  17. Masoudi-Nejad A, Goto S, Jauregui R, Ito M, Kawashima S, Moriya Y, Endo T, Kanehisa M. EGENES: transcriptome-based plant database of genes with metabolic pathway information and expressed sequence tag indices in KEGG. Plant Physiol. (2007) 144:857–866

  18. Rhee SY, Zhang P, Foerster H, Tissier C. AraCyc: overview of an Arabidopsis metabolism database and its applications for plant research. In: Plant Metabolomics.—Saito K, Dixon RA, Willmitzer L, eds. (2006) Heidelberg: Springer Berlin. 141–154.

  19. Caspi R, Foerster H, Fulcher C, Hopkinson R, Ingraham J, Kaipa P, Krummenacker M, Paley S, Pick J, et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. (2006) 34(Suppl. 1):D511–D516

  20. Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, et al. Reactome: a knowledge base of biologic pathways and processes. Genome Biol. (2007) 8:R39

  21. Barthelmes J, Ebeling C, Chang A, Schomburg I, Schomburg D. BRENDA, AMENDA and FRENDA: the enzyme information system in 2007. Nucleic Acids Res. (2007) 35(Suppl. 1):D511–D514

  22. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. (2000) 28:304–305.

  23. Boeckmann B, Bairoch A, Apweiler R, Blatter M.-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, et al. The SWISS-PROT protein knowledgebase and its supplement trEMBL in 2003. Nucleic Acids Res. (2003) 31:365–370.

  24. Schwacke R, Schneider A, van der Graaff E, Fischer K, Catoni E, Desimone M, Frommer WB, Flügge U.-I, Kunze R. ARAMEMNON: a novel database for Arabidopsis integral membrane proteins. Plant Physiol. (2003) 131:16–26

  25. Klamt S, Saez-Rodriguez J, Gilles ED. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC Syst. Biol. (2007) 1:e2.

  26. Junker BH, Klukas C, Schreiber F. VANTED: a system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics (2006) 7:e109.

  27. Wurtele ES, Li J, Diao L, Zhang H, Foster CM, Fatland B, Dickerson J, Brown A, Cox Z, et al. MetNet: software to build and model the biogenetic lattice of Arabidopsis. Comp. Funct. Genomics (2003) 4:239–245.

Table 1

Table 1. Information contained in MetaCrop

  Hordeum vulgare Triticum aestivum Oryza sativa Zea mays Solanum tuberosum Brassica napus Totala

Pathways 36 33 34 34 34 26 38
Enzymatic reactions 291 271 278 273 207 168 392
Transport processes 7 6 9 27 14 7 59
Compartments 4 4 4 3 3 3 5
References 382 347 340 346 252 204 734

aIncluding other plants; pathways, reactions and other information occurring in more than one plant are only listed once.


Figure

mcith_gkm835f1.JPG Figure 1 Screenshots of the web interface of MetaCrop. (a) A pathway (sucrose breakdown in dicotyledon species, which shows compartmentalization, transporter and isoenzymes); (b) Information connected to pathways: conversion details (cytosolic phosphoglucose isomerase): stoichiometry, catalyst, metabolites, conversion location, subset of taxon-specific kinetic parameters (vmax, km) given for cytosolic phosphoglucose isomerase.

(Click image to enlarge)

 


http://www.biology-online.org/articles/metacrop-detailed-database-crop-plant.html