A new versatile database created for geneticists and breeders to link molecular and phenotypic data in perennial crops: the AppleBreed DataBase


A new versatile database created for geneticists and breeders to link molecular and phenotypic data in perennial crops: the AppleBreed DataBase

A. Antofie 1, M. Lateur 1, R. Oger 1,*, A. Patocchi 2, C. E. Durel 3 and W. E. Van de Weg 4

1Walloon Agricultural Research Centre (CRA-W), Gembloux, Liroux 9, B-5030, Belgium, 2Agroscope Changins-Wädenswil (ACW), Phytopathologie, P.O. Box 185, Schloss CH-8820 Wädenswil, Switzerland, 3Genetics and Horticulture – GenHort, National Institute for Agricultural Research, 15 INRA, BP 60057, F-49071 Beaucouzé Cedex, France and 4Plant Research International (PRI), P.O. Box 16, 6700 AA Wageningen, The Netherlands


Open Access article from Bioinformatics 2007 23(7):882-891.




Objective: AppleBreed DataBase (DB) aims to store genotypicand phenotypic data from multiple pedigree verified plant populations(crosses, breeding selections and commercial cultivars) so thatthey are easily accessible for geneticists and breeders. Itwill help in elucidating the genetics of economically importanttraits, in identifying molecular markers associated with agronomictraits, in allele mining and in choosing the best parental cultivarsfor breeding. It also provides high traceability of data overgenerations, years and localities. AppleBreed DB could serveas a generic database design for other perennial crops withlong economic lifespans, long juvenile periods and clonal propagation.

Results: AppleBreed DB is organized as a relational database.The core element is the GENOTYPE entity, which has two sub-classesat the physical level: TREE and DNA-SAMPLE. This approach facilitatesall links between plant material, phenotypic and molecular data.The entities TREE, DNA-SAMPLE, PHENOTYPE and MOLECULAR DATAallow multi-annual observations to be stored as individual samplesof individual trees, even if the nature of these observationsdiffers greatly (e.g. molecular data on parts of the apple genome,physico-chemical measurements of fruit quality traits, and evaluationof disease resistance). AppleBreed DB also includes synonymsfor cultivars and pedigrees. Finally, it can be loaded and exploredthrough the web, and comes with tools to present basic statisticaloverviews and with validation procedures for phenotypic andmarker data to certify data quality.

AppleBreed DB was developed initially as a tool for scientistsinvolved in apple genetics within the framework of the Europeanproject, ‘High-quality Disease Resistance in Apples forSustainable Agriculture’ (HiDRAS), but it is also applicableto many other perennial crops.



Breeding cultivated plants and, in particular, apple trees (Malusx domestica Borkh.), has always been an important activity atboth the amateur and professional level. Compared with annualcrops, breeding perennial crops is complex, long-lasting andtime consuming due to their long juvenile phase and long economiclifespan. Cultivars of perennial crops are grown across largegeographical areas and consequently new cultivars and breedingselections have to be evaluated over many successive years invarious localities. The age of the trees has to be taken intoaccount when phenotypically characterizing material becausephenotypic characteristics change with age. Other characteristicsgive perennial crops some advantages in genetic research, suchas their suitability for vegetative propagation. This meansthat many old cultivars and breeding selections still exist,which allows an identical genotype to be tested in various localitiesand in various years. In addition, the simultaneous presenceof various successive generations within a single experimentis possible. All these specific characteristics affect breedingprocedures and genetic experiments and put high demands on thestorage of phenotypic data.

The demands for the storage of genotyping data is also increasingtremendously due to the pace at which high numbers of PCR-basedmolecular markers are being developed. Initially, studies onmarker-trait associations were limited in size, usually involvingjust a single cross. The use of a single cross suffices as longas the genetic basis of a trait is extremely simple (only onelocus with one + allele). In all other cases, multiple crossesare needed if sound conclusions are to be reached on the numberof loci, alleles and mode of action of genes (intra- and/orinter-locus interactions). Studies on multiple crosses thereforedemand high quality and good data management facilities.

In the perennial apple crop, a new concept of gene and QTL identificationwas initiated called Pedigree Genotyping (Van de Weg et al.,2004). This approach aims to identify marker-gene associations,functional allelic diversity and both intra- and inter-locusinteractions by the integrated analysis of multiple plant populations(crosses, breeding selections and commercial cultivars) thatare genetically related by their pedigree. The European projectHigh-quality Disease Resistance in Apples for SustainableAgriculture’ (HiDRAS) (Gianfranceschi and Soglio, 2004),was initiated to test the concept. In this study, more than2000 genotypes are being extensively phenotyped and genotyped,delivering more than 1 million data points. Each phenotypicdata point is associated with its own descriptors for tree,year, sample and locality. Each genetic data point is associatedwith its own descriptors for DNA sample, tree, genotype, markerand map position. To meet the needs for the storage and accessibilityof these data, a database was needed. There are already severaldatabases managing both genomic and phenotypic information forthe plant kingdom. MaizeGDB database (Lawrence et al., 2004),for instance, is a repository for maize sequence, stock, phenotype,genotypic and karyotypic variation, as well as chromosomal mappingdata The GrainGenes database (Matthews et al., 2003) focuseson grasses and cereals storing both genetic and phenotypic information.It holds, amongst others, the genealogy and allelic constitutionof markers and genes from 69 632 wheat accessions. Other databaseshave been developed for managing genome molecular information(Rhee et al., 2003; Schoof et al., 2002) or for storing genesand protein information for Arabidopsis thaliana (ABRC, NASC,MATDB).

All these databases focus on annual plants and most of themmanage genomic or phenotypic information separately. None ofthem allows the management of pluri-annual data on the sameindividual plants (Reiser et al., 2002; Sakata et al., 2000).As none of the existing public databases were able to supportextensive studies on marker-trait associations in pedigreedpopulations of perennial crops, AppleBreed DB was developed.In the context of database construction, apples could serveas a model for perennial crops. Apples are a woody perennialand have a 3–7 year juvenile phase, which is a significanthandicap in combining high fruit quality and durable diseaseresistance by classical breeding. Apples are self-incompatibledue to a gametophytic incompatibility system, and thereforeinbreeding methods are not applied (Lespinasse, 1992). Applesare vegetatively propagated, have an economic lifespan of about15 years during which they produce 13 crops, are economicallyimportant and are highly rated among consumers, being rankedthird in a fresh fruit consumption survey after banana and citrus(Pollack, 2001). Currently, there are more than 10 000 applecultivars (Morgan and Richardson, 2002; Way et al., 1991) throughoutthe world. Nevertheless, world apple production is based ona handful of cultivars that are grown in commercial orchards.The most important commercial cultivars are highly susceptibleto the most important apple diseases (scab, powdery mildew andEuropean canker), and most of the resistant cultivars do notyet meet the quality demands of consumers. The most importantobjective of worldwide apple breeding programmes is thereforeto combine high fruit quality with good disease resistance.To achieve this aim, breeders need a better understanding ofthe genetic basis of fruit quality traits and disease resistance,and to obtain access to molecular markers for the most importantgenes controlling these traits.

AppleBreed DB supports breeders and geneticists in their geneticstudies and in their exploration of germplasm collections. Structuredinformation stored in the database should help them not onlyto elucidate the genetics of complex traits and to assess marker-traitassociations, but also to choose more easily and more quicklythe most interesting genitors to cross with (e.g. with gooddisease resistance, a particular taste, or a skin colour preferredby consumers). In this way, it is expected that breeders willmore easily be able to create new cultivars meeting consumerpreferences and allowing sustainable production systems. Thisarticle describes the database model of AppleBreed DB. AppleBreedDB is sufficiently generic to allow it to be used as a modeldatabase for many other perennial crops.


AppleBreed DB is intended to be a functional tool for geneticistsand breeders. Its data model combines conceptual and logicaldata models. The aim is to represent the implementation levelof information. The conceptual data model (CDM) was the firststep in processing the database design, followed by the logicaldata model (LDM). The CDM includes the main entities/structuresand the relationships among them, without specifying attributesor primary keys. Here, the highest levels in the relationshipsamong the different entities are identified. The CDM is dividedinto super-classes, classes and sub-classes. By definition,a class includes entities characterized by the same profile(e.g. the class marker includes the identities SSR, RFLP, SCARand AFLP). A super-class is a generalization of the classes.For example, the molecular markers, alleles and map classesgive different information on the genome and can be groupedat a superior level into a super-class called ‘Moleculardata’. The same principle is used to define the sub-classes. The LDM features specify all the entities and links definedin the model, as well as the attributes and the primary andforeign keys (keys identifying the link established betweendifferent entities). In line with examples of data models describedin the literature and with our objectives, the relational datamodel was chosen to ensure traceability of the collected data.To facilitate access to the database by users, a web interfacehad to be developed. Consequently, database management applicationswere chosen that would be accessible through the web (e.g. MySQL)and preferably based on open source programmes and operatingsystems (e.g. Linux).


Conceptual data model (CDM)

The CDM includes six main super-classes: GENOTYPE, PHENOTYPEDATA, MOLECULAR DATA, GROWTHSITE, ORGANIZATION and REFERENCE.As shown in Figure 1, all entities are structured around thesuper-class GENOTYPE, which is the core element of the model.It covers all plant material by individual trees and DNA sampleswhich can come from any kind of material (cultivars, breedingselections, segregating populations and gene bank accessions).GENOTYPE is subdivided into three classes: PLANT MATERIAL, PASSPORTand SYNONYMS. PLANT MATERIAL includes the two main sub-classesTREE and DNA-SAMPLE, PASSPORT includes the PEDIGREE and ACCESSIONmain sub-classes, and SYNONYMS includes the SYNONYM and PATRONYMsub-classes.

TREE and DNA-SAMPLE hold the identity descriptors for each individualtree, DNA sample and genotype name. TREE also includes descriptorsfor the precise location where trees were grown (institute,plot, row and position in row) and their origin (origin of budwood, year of sowing, planting and grafting, and rootstock).DNA-SAMPLE also includes the origin of each sample (tree fromwhich the sample was derived, date of isolation and positionon micro-titre plates of the original sample as their sub-samplesetc.).

ACCESSION is used to identify and characterize the plant material(cultivars, breeding selection, segregating population and genebank accession information). PEDIGREE describes the parentageof each accession up to the founder level and therefore facilitates‘Pedigree Genotyping’, a new pedigree-based approachof QTL identification and allele mining in pedigreed populations(Van de Weg et al., 2004). The class SYNONYMS holds the knownsynonyms and patronyms of each genotype, and accounts for themost frequently occurring typing errors.

Figure 1 shows the relationships between GENOTYPE and otherelements of the database. Each genotype is localized in oneor more specific trial plots (GROWTHSITE) and each institution(ORGANIZATION) supervises its trial plots. Genotypes are evaluatedfor their fruit quality and disease resistance (PHENOTYPE DATA).The procedures and results of the genotype DNA analyses arestored in MOLECULAR DATA. Each genotype listed in the databaseis referenced according to the literature references (Silbereisenet al., 1996; Smith, 1971) in the REFERENCE super-class. Table 1summarizes the information included in each super-class andthe corresponding main classes.

Most classes are further divided into one to various generationsof sub-classes, until the desired level of detail is reached.All these entities have been converted into tables at the LDMlevel. A class or sub-class may include one or several tables.The most important tables of the database are listed in Table 2.

As stated earlier, GENOTYPE holds information that identifiesgenotypes (names of cultivars, breeding selections, crossesand gene bank accessions characteristics) and the tangible partof the plant material (trees and DNA samples). Phenotypic informationconcerns fruit quality and disease resistance. Finally, molecularinformation relates to molecular markers used to construct geneticlinkage maps, information on mining allele, loci and pedigreeof the allele. Each genotype listed in the database is consideredto be a central key for the traceability of the informationstored in the AppleBreed DB.


Logical data model (LDM)

The LDM describes entities defined within each super-class andtheir relationships with other entities defined above. The databasediagrams (see Figs 2–4GoGo) give an external view of the AppleBreedDB data content. The consistency of data is automatically checkedby the database management system itself, at a superior level,according to the rules and the relationships defined when theschema is implemented. The LDM is presented in more detail forthe super-classes (i) GENOTYPE, (ii) PHENOTYPE and (iii) MOLECULARDATA, specifying their primary and secondary keys.


GENOTYPE super-class

In the GENOTYPE super-class (Fig. 2) the GT_TREE table and GT_DNA_SAMPLEtable are the most important because they allow the individualgenotype for the phenotype assessment and the molecular dataanalysis to be set up. Because of the high importance of plantmaterial identification and certification for genetic studies,the emphasis was put on tracking and tracing aspects for thedefinition of the structure of these tables. Their detailedcontent is presented in Tables 3 and 4. The link between themis made through the TREE_LABEL primary key.

The GT_ACCESSION table is used to store information that isassigned to an accession when it is entered into a collection.The key element of this table is the accession number, whichis unique in the collection. Once assigned, this number cannever be reassigned to another accession. The GT_PEDIGREE tableallows a user to determine whether a relationship exists betweenphenotypic characteristics and genomic results from genitorsand their progenies.

The high number of synonyms for cultivars is a recurrent problemfor breeders, geneticists and managers of gene banks; they impairthe efficient management and exploitation of the collections.

This is especially true for old genotypes received or collectedin different places and times.

For example, Cox's Orange Pippin has more than 40 synonyms.In addition, very modern cultivars often have both a cultivarand a trademark name. Finally, for widely grown cultivars thereare often many mutants, each with its own name. For the oldcultivars, there are many sources of synonyms. One is translationor transliteration of original names into local languages. Thereare also spelling errors due to the ‘appropriation’,over time, of introduced foreign genotypes in local traditions,resulting in new local names adapted to the language or dialect(Oger and Lateur, 2004). This problem can lead to major disappointments.For example, geneticists might believe they are working on differentgenotypes, but after obtaining their results they realize theyare working on the same genotype with different names. The databasemodel addresses this problem by using the SYNONYMS main class.The first appellation found in the literature has, in most cases,to be considered as the patronymic name. This name is filledout in the identifier field PATRONYM_NAME in the GT_PATRONYMtable, as displayed in Figure 2, and a link between the patronymicname of a genotype and its synonyms is assured through the PATRONYM_ID.

3.2.2 PHENOTYPE DATA super-class
The PHENOTYPE DATA super-class (Fig. 3) includes two main classes:FRUIT QUALITY and DISEASE RESISTANCE. Each genotype is studiedfor several traits (Gianfranceschi and Soglio, 2004), such as:fruit external characteristics (shape, ground colour, overallcolour, fruit size, etc.), fruit internal quality (sugar content,starch index, acidity, etc.), the sensorial evaluations of expertpanels to determine the quality of the fruits (sourness, juiciness,firmness, etc.) and the disease levels under natural conditionsin the orchard as well as in specially designed greenhouse tests.Data are encoded for each individual assessment, which can bemade for a series of individual apples (e.g. firmness data)for different dates (e.g. 0, 2 and 4 months after harvest),localities and years.

Figure 3 also displays the relationships among the main tablesof this super-class as well as the relationships with othertables included in other entities, such as GENOTYPE and GROWTHSITE.With regard to the sensorial, instrumental, external and panelexpert observations, a composite primary key identifies eachobservation. This key includes an identifier for the sample,an identifier for harvest times (a date), an identifier forthe applied method of assessment and an identifier for the institutionmaking the observations.

This kind of primary key structure gives each institution thepossibility of marking its own samples (there is a unique samplecode number for each institution).

Each genotype is linked to fruit quality assessment tables (PH_INSTRUMENTAL_ANALYSIS,PH_SENSORIAL_ANALYSIS, PH_EXPERT_PANEL, PH_SENSORIAL ANALYSIS)by the successive tables GT_TREE PK_TREE_LABEL and the tablePH_SAMPLE. The PH_SAMPLE_ID field links all the informationfrom instrumental, sensorial and disease observations to trees,and thereby to genotypes.

Each tree is assessed individually, making it possible to connectphenotypic observations with molecular marker data by meansof the genotype. This structure allows users to select, forexample, a genotype with fruits that have the same level ofsugar content and the same starch index, or are similar or dissimilarfor other important characteristics. The primary key for PH_DISEASE_ASSESSMENT(the table is included in the DISEASE RESISTANCE class) is alsoa composite key. This key includes the identifiers of each individualtree, observation date, disease identifier and observed organplant, as well as an identifier for the applied method of assessment.

MOLECULAR DATA super-class
The objective of the HiDRAS project is to molecularly characterizeall the individuals belonging to a selected pedigree using highlyinformative markers. Families and their connected progeniesare chosen for being representative of apple breeding materialand differentiated for fruit quality and disease resistance.

One aspect of the project concerns the development of new highlyinformative molecular markers to fill the gaps in the availableapple linkage maps. The origin of all alleles of each marker/genotypecombination is assessed in terms of the alleles of the foundingcultivars (identity by descent) by analysing marker data.

The MOLECULAR DATA super-class (Fig. 4) is one of the most importantcomponents of the model. Its data describes the genetic constitutionof each genotype (allelic composition of molecular markers andmajor genes) and must allow alleles to be traced over generations.Starting from the genotype, all information is linked in thedatabase as a chain. The molecular information is linked tothe genotype by the GT_DNA_SAMPLE table and the DNA_SAMPLE_ID.In the MOLECULAR DATA super-class, the MOL_DNA_LINK_LOCUS andthe LOCUS_ID make the link with MOL_LOCUS, MOL_ALLELE, MOL_MAPSand MOL_MARKER tables.

The content of the MOL_LOCUS and MOL_MARKER tables is describedin Table 5.

Due to the links between all the tables, the AppleBreed DB caneasily provide input data for QTL software to search for combinationsof certain molecular markers and fruit quality traits (e.g.skin colour, shape or global taste).


Database implementation

AppleBreed DB was implemented within a MySQL database systemand a Linux environment. A web interface was developed in PhPlanguage. Figure 5 illustrates the data management system adoptedfor the submission and validation of data. Users send theirdata to the database administrator via specific, standardizedtemplates (Excel files). Data quality control involves threesteps: (1) the structure of the encoding templates (templatescreated to collect data include control concerning the allowednumeric values or class evaluations), (2) the quality checkby the database manager and (3) the constraints existing inthe database structure itself (a journal with the error valuesis generated). Once these checks are achieved, the results regardingsuspicious data are sent back to users for validation. Afterre-submission, the administrator carries out the transfer andintegration of data into the database structure. Finally, userscan visualize and upload both the raw and interpreted resultsby accessing specific web pages. Simple SQL queries allow on-lineaccess to the database through the Internet. Various real-timequery tools have been developed, including specific multiple-choicequestionnaires for different views of the requested information.Data output formats can be generated ‘à la carte’,making output directly compatible for a wide range of softwarepackages, including packages for QTL mapping.




AppleBreed DB is, as far as the authors know, the first databaseto store both genetic and phenotypic data up to the level ofindividual observations. This combination of data makes AppleBreedDB a powerful tool for extensive genetic studies directed atthe assessment of marker-trait associations, for candidate genevalidation and for allele mining. AppleBreed DB takes into accountthe particularities of perennials such as: (1) vegetativelypropagated, allowing the same genotype to be present at variouslocalities, (2) long juvenile phase, (3) multi-annual crop,(4) long economic lifespan and (5) simultaneous availabilityof successive generations in the same plot of breeding programs,experimental stations and gene banks. These aims and particularitiesdetermined the general structure of the database, and have resultedin a framework quite distinct from models in use for annualcrops, such as the ZmDB database (Dong et al., 2002; Du et al.,2003; Gai et al., 2000) or the MaizeGDB database (Lawrence etal., 2004).

AppleBreed DB is built on a relational model. The structureof its conceptual model allows for the flexible addition ofnew entities. In other words, the AppleBreed DB structure allowsdata with new characteristics to be easily and quickly integratedinto the database, at least as long as the database integrityrules are respected. The ability to encode new data into thedatabase is checked by the database structure itself.

Due to the relational structure of the database, users’queries are easily handled through SQL requests. Other potentialreal-time query tools can be easily added, such as specificmultiple-choice questionnaires for different views of the requestedinformation. Modules to export data in ‘à la carte’output formats are also under development, making data directlycompatible for a wide range of software packages, includingpackages for QTL mapping. An interesting point for geneticistsand breeders is that it is possible to manage traceability ofplant material, a genotype or a family and to follow the parentsand their descendants. In addition, the flexibility of the datamodel makes it possible to adapt this system for other multi-annualbotanical species. Unfortunately, one characteristic of relationaldatabases might represent an inconvenience. Direct encodingof results is not allowed, for example, for new genotypes ormarkers. It is always necessary to insert new data in a particularand logical order and according to a specific and defined format.

AppleBreed DB can store phenotypic data at the level on whichthey were originally assessed, including at the level of individualsamples. In addition, the position of trees in the orchard andthe genetic relationships among genotypes are documented. Together,this allows in-depth analysis of the data because experimentaldesign, position effects, genetic relationships and experimentalvariation can be taken into account.

This not only allows in-depth classical analysis of the phenotypicdata itself, such as heritability estimates and the effect ofdifferent cultivation practices and environments, but also ensuresa high-power detection of marker-trait associations. As it standsAppleBreed DB will be a powerful tool for resolving the geneticbase of horticulturally important traits. In addition, it hasthe potential to support valorization of EST and genome sequencingprojects, since its phenotypic and genetic data can be helpfulin the identification of the candidate genes validated by geneticists.

Currently, there are various public databases for perennialcrops that are related to different aspects of genetics andbreeding. The USDA-ARS Germplasm Resources Information Network(GRIN http://www.ars-grin.gov/npgs/) is a database which storesinformation about clonal germplasm in the USDA system, includingvarious tree species as apples, pears stone fruits, grapes,etc. The Genome Database for Rosaceae (GDR, http://www.mainlab.clemson.edu/gdr/)is a curated and integrated web-based relational database. GDRcontains data on physical and linkage maps, annotated EST sequencesand all publicly available Rosaceae sequences. Although thisdatabase started as a database for Prunus, it is now extendingto other families of the Rosaceae. Various databases for themanagement of genetic resources were created by the EuropeanCooperative Programme for Plant Genetic Resources Networks (ECP/GR).These databases are crop specific and include Apple (http://www.ecpgr.cgiar.org/databases/Crops/Malus.htm[Maggioni et al., 2002]), Pear (http://pyrus.cra.wallonie.be/)and various stone fruits (http://www.bordeaux.inra.fr/urefv/base/).The HiDRAS SSRdb (http://www.hidras.unimi.it/) contains detailedinformation on more than 300 SSR markers that have been mappedin apple. The AppleBreed DB is currently uploading the HiDRASdata, most of which are likely to become public. All these databasesare relational, curated and web based. They are continuouslyextending in content and functionality. Much synergism couldbe obtained by tuning into their policies, content and formats,and much added value could be obtained if private databasessuch as the HortResearch Apple EST Database (Crowhurst et al.,2005) became part of the network.

Concluding remarks

The AppleBreed DB model provides a unique tool specificallyadapted for geneticists and breeders working on perennial cropswith a long economic lifespan, especially when the aim is tocombine phenotypic and molecular marker data. It supports pedigree-basedanalysis of the data, including ‘Pedigree Genotyping’(Van de Weg et al., 2004). This database could be useful inintercontinental collaboration on marker-trait associations,validation of candidate genes and functional allelic diversity.It can be directly applied to apple, and its structure formsa firm foundation on which other users can build their own applications.It can be easily extended to include various crops, thus forminga base for a RosaceaeBreed DB. Links to other databases suchas GRIN, NCBI (National Center for Biotechnology Information),SINGER (System-wide Information Network for Genetic Resources)and EMBL (Nucleotide Sequence Database), can also be investigated.



This research was carried out with financial support from theCommission of the European Communities (Contract No. QLK5-CT-2002-01492),Directorate-General Research—Quality of Life and Managementof Living Resources Program. This manuscript does not necessarilyreflect the Commission's views and in no way anticipates itsfuture policy in this area. Its content is the sole responsibilityof the authors. The authors are deeply indebted to all participantsof the HiDRAS project for their involvement, collaboration andsupport in the development of the conceptual data model of AppleBreedDB. Funding to pay the Open Access publication charges was providedby Agricultural Walloon Research Centre of Gembloux (Belgium). Conflict of Interest: none declared.



Associate Editor: Chris StoeckertReceived on November 20, 2006; revised on January 11, 2007; accepted on January 12, 2007.


Crowhurst RN, et al. The HortResearch apple EST database – a resource for apple genetics and functional genomics. ( (2005) ) Proceedings of Plant & Animal Genomes XIII Conference. http://www.intl-pag.org/13/abstracts/PAG13_P499.html..

Dong Q, et al. ZmDB, an integrated database for maize genome research. Nucleic Acids Res, ( (2002) ) 31, : 244–247.

Du C, et al. Development of a maize molecular evolutionary genomic database. Comp. Funct. Genomics, ( (2003) ) 4, : 246–249.

Gai X, et al. Gene discovery using the maize genome database ZmDB. Nucleic Acids Res, ( (2000) ) 28, : 94–96.

Gianfranceschi L, Soglio V. The European 589 project HiDRAS: innovative multidisciplinary approaches to breeding high quality disease resistant apples. Acta Hortic, ( (2004) ) 663, : 327–330..

Lawrence CJ, et al. MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res, ( (2004) ) 32, : D393–D397.

Lespinasse Y. Le pommier. In: Amélioration des espèces végétales cultivées., —Gallais A, Bannerot H, eds. ( (1992) ) Paris: INRA Editions. 579–594..

Maggioni L, et al. Report of a working group on Malus/Pyrus compilers. ( (2002) ) Second Meeting, 2–4 May 2002: Dresden-Pillnitz, Germany. Rome: IPGRI – ECP/GR..

Matthews DE, et al. GrainGenes, the genome database for small-grain crops. Nucleic Acids Res, ( (2003) ) 31, : 183–186

Morgan J, Richardson A. The New Book of Apples., ( (2002) ) London: Ebury Press..

Oger R, Lateur M. Development of a specific software for the management of the recurrent synonymous problem of cultivars inside plant genetic resources databases: the case of the European EUROPEAN ECP/GR Pyrus database. Acta Hortic, ( (2004) ) 663, : 593–596..

Pollack S. Consumer demand for fruit and vegetables: the U.S. example. In: USDA Economic Research Service Agriculture and Trade Report WRS-01-1., —Regmi A, ed. ( (2001) ) Washington, DC, USA: USDA..

Reiser L, et al. Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems. Plant Mol. Biol, ( (2002) ) 48, : 59–74.

Rhee SY, et al. The Arabidopsis information resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res, ( (2003) ) 31, : 224–228.

Sakata K, et al. INE: a rice genome database with an integrated map view. Nucleic Acids Res, ( (2000) ) 28, : 97–101.

Schoof H, et al. MIPS Arabidopsis thaliana database (MatDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res, ( (2002) ) 30, : 91–93.

Silbereisen R, et al. Obstsorten-Atlas., ( (1996) ) Stuttgard: Eugen Ulmer GmbH & Co..

Smith MWG. National Apple Register of the United Kingdom., ( (1971) ) Ficheries and Food, London: Ministry of Agriculture..

Van de Weg WE, et al. Pedigree genotyping: a new pedigree-based approach of QTL identification and allele mining. Acta Hortic, ( (2004) ) 663, : 45–50..

Way RD, et al. Apple (Malus). Acta Hortic, ( (1991) ) 290, : 3–46..

Table 1

Table 1. Super-classes content in AppleBreed DB
Super-classes Acronym Content Main classes

GENOTYPE GT Information on the material that represents the genotype (tree, DNA sample), passport data of the genotype (accession, pedigree) and synonyms Plant material, passport, synonyms
PHENOTYPE DATA PH Fruit quality results (as external, sensorial, instrumental evaluations and expert panel results) and disease resistance evaluations Fruit quality, disease resistance
MOLECULAR DATA MOL Information on all results related to markers, linkage groups and allelic forms of the markers, and all necessary information for building maps with markers of a specific genotype. Marker information includes sizes of observed bands, PCR protocols, date and laboratory at which the data were raised primer sequences, and the gDNA or EST sequences from which the markers were derived. Allele Markers Locus Linkage group Maps
GROWTHSITE GRO Information on location of trees and orchards Site Trial plot
ORGANIZATION ORG Information about institutions supervising the site and the trial plot Institution
REFERENCES REF Information on literature references used to describe the genotypes and the evaluation procedure Reference

Table 2

Table 2. Content and definition of the main tables corresponding to each super-class in AppleBreed DB
Super-classes Main classes Main tables Content

GENOTYPE PLANT MATERIAL GT_TREE Trees traced in the model
    GT_DNA_SAMPLE Information on DNA samples used in the model
  PASSPORT GT_ACCESSION Type of material (cultivar, breeding selection, segregating population, gene bank accession)and their names, including synonyms
    GT_PEDIGREE Parents of each accession, if known
  SYNONYMS GT_SYNONYM List of synonyms for each patronym
    GT_PATRONYM Patronym names with their literature references
PHENOTYPE DATA FRUIT QUALITY PH_INSTRUMENTAL_ANALYSIS Instrumental analysis made during the observation period
    PH_EXTERNAL ANALYSIS External analysis made duringthe observation period
    PH_SENSORIAL_ANALYSIS Sensorial analysis made during the observation period
    PH_EXPERT_PANEL Expert panel evaluation
    PH_SAMPLE Sampling of the fruit to facilitate the traceability of the information
  DISEASE RESISTANCE PH_DISEASE Information on diseases
    PH_DISEASE_ASSESSEMENT Observations made over several years
    PH_DISEASE_ORGAN Information on the plant organ evaluated
  MARKERS MOL_MARKERS Markers used in the molecular analyses
  LOCUS MOL_LOCUS Locus names and other information about it
  LINKAGE GROUP MOL_LG Linkage group information: link between genotype, loci and allele
  MAPS MOL_MAPS Maps description
GROWTHSITE SITE GRO_SITE Sites used to locate the genotype
  TRIAL PLOT GRO_TRIAL_PLOT Trial plots used to locate the genotype
    GRO_TP_FERTILITY Soil fertility classes
    GRO_TP_DRAINAGE Soil drainage classes
    GRO_TP_ORGANIC_MATTER Soil organic matter classes
    GRO_TP_TEXTURE Soil texture classes
ORGANIZATION INSTITUTION ORG_INSTITUTIONS Institutions which supervised a growth
REFERENCES REFERENCES REF_REFERENCES Information on references used to describe the genotypes or the analysis methods

Table 3

Table 3. Content and definition of the GT_TREE table

Fields Definition Remarks

TREE_LABEL Unique identifier of the tree in the DB PK
TRIAL_PLOT_ID Identifier of the trial plot inside the institution FK to GRO_TRIAL_PLOT
ACCESSION_ID Accession number in the institution's collection FK to GT_ACCESSION
ROW_NUMBER Row number of the tree in the orchard  
POSITION_IN_ROW Tree position in the row of the orchard  
PLANTING_YEAR Planting year of the tree  
PLANTING _PERIOD Planting period in the year  
MULTIPLICATION_TYPE Type of multiplication used to produce the plant material  
REMARKS Any further remarks on the plant material or the planting conditions  

PK: primary key; FK: foreign key.

Table 4

Table 4. Content and definition of the GT_DNA_SAMPLE table

Fields Definition Remarks

DNA_SAMPLE_ID DNA sample identifier PK
DNA_SAMPLE_NAME Name of the sample  
TREE_LABEL Label of the tree from which the DNA sample was collected FK to GT_TREE
DATE_LEAF_COL Date of leaves (sample) collection  
COLLECTOR_NAME Name of the person who collected the sample  
ISOLATION_DATE Date of the DNA sample isolation  
ISOLATOR_NAME Name of the person who isolated the DNA sample  
DNA_REF_PROTOCOL DNA protocol used (file name) Link to a protocol file
DATA_ENCODE_DATE Date of data encoding  
DATA_ENCODE_NAME Person who encoded the data  
ORIG_PLATE_NB Original plate identifier  
ORIG_PLATE_ROW Row of the micro-titre plate  
ORIG_PLATE_LINE Line of the micro-titre plate  
ORIG_INSTITUTION Institution that provided the sample FK to ORG_INSTITUTION
NEW_PLATE_NB New plate identifier  
NEW_PLATE_ROW New micro-titre plate row Link to an image file
NEW_PLATE_LINE New micro-titre plate line  
NEW_INSTITUTION Institution that conducted the sample analysis FK to ORG_INSTITUTION
REMARKS Any further remarks on the DNA sample collection and isolation conditions  

PK: primary key; FK: foreign key.

Table 5

Table 5. Content and definition of the MOL_LOCUS and MOL_MARKERS tables

Fields Definition Remarks

MOL_LOCUS table    
LOCUS_ID Locus identifier PK
LOCUS_NAME Locus name  
MARKER_ID Marker identifier FK to MOL_MARKERS
PUB_REF Published reference FK to REF_REFERENCE
ORIG_SET_REF_CV Original set reference cultivar  
ADD_INFO_LG Additional information on the linkage group  
NEW_ALLELE_VAL New allele value  
DATA_ENCODE_DATE Date of data input  
DATA_ENCODE_NAME Person who encoded the data  
MOL_MARKERS table    
MARKER_ID Marker identifier PK
MARKER_NAME Marker name  
MARKER_TYPE Marker type  
FORWARD_PRIMER Forward primer-sequence  
REVERSE_PRIMER Reverse primer-sequence  
GEL_IMG_REF Reference to a gel image Link to image file
REQ_TEMP Temperature used  
LOCI_NUMBER Number of loci found for the marker  
UPDATE_SHEET Date of update  
LOCUS_STATUS Locus status  
RESEARCHER Researcher name  
PUB_REF Published reference FK to REF_REFERENCE
DATA_ENCODE_DATE Date of data input  
DATA_ENCODE_NAME Person who encoded the data  
ORIGIN_SEQ Original sequence  
ORIGIN_FORWARD_SEQ Original forward primer-sequence  
ORIGIN_REVERSE_SEQ Original reverse primer-sequence  
PCR_PROTOCOL PCR protocol used Link to protocol file

PK: primary key; FK: foreign key.



Click image to enlarge

Figure 1   Conceptual data model of AppleBreed DB and existing links between various super-classes.

Click image to enlarge

Figure 2  Detailed structure of the GENOTYPE super-class. 

Click image to enlarge

Figure 3  Detailed structure of the PHENOTYPE DATA super-class. 

Click image to enlarge

Figure 4  Detailed structure of the MOLECULAR DATA super-class. 

Click image to enlarge

Figure 5   Data flow setup within the framework of the model.