Several plant databases now use the PSO as the main ontology
for annotating gene expression data and for describing phenotypes.
TAIR and Gramene retired their species-specific anatomical ontologies
for Arabidopsis and cereals, respectively, and have been using
the PSO exclusively. In MaizeGDB, the original maize vocabulary
(Vincent et al., 2003
) has been partially integrated with PSO
terms, and the goal is to complete this integration in the near
future. At this time, both sets of terms are used for annotations,
which can then be queried using PSO or maize-specific term names.
The PSO is currently implemented in several genomic databases
and is displayed at TAIR (www.arabidopsis.org), Gramene (www.gramene.org),
Nottingham Arabidopsis Stock Centre (NASC), the European Arabidopsis
center (www.arabidopsis.info), BRENDA, the comprehensive enzyme
information system (www.brenda.uni-koeln.de), and the MaizeGDB
The POC database is set up as a portal through which the datacurated using PO for different plant organisms, such as Arabidopsis,rice, and maize, can be easily accessed at one site. Informationfrom one hierarchical level in the ontology is propagated upto the next level (i.e. annotation to any given term with is_aor part_of relationship type implies automatic annotation toall ancestors of that term). Therefore, users can make inferencesand perform queries at different levels in the PSO. For example,all Arabidopsis, rice, and maize genes expressed in the flowerand phenotypes with altered floral development can be retrievednot only by a search using the term flower, but also by a searchusing the term inflorescence, of which the flower is a part.Also, a search with the term flower should retrieve all genesexpressed in stamens, pistils, petals, or sepals. To elucidatethe primary application of the PO, the annotation process incontributing databases is described below, followed by specificexamples of how scientists can efficiently use the PSO in theirresearch.
Annotations to the PSO in Participating Databases
A user interested in genes involved in leaf vascular developmentcan query the PSO by entering an appropriate term, for example,leaf vein, in the PO browser and retrieve all annotations tothis term and its children terms (midvein and secondary vein)in Arabidopsis, rice, and maize. The list includes genes thatare expressed in leaf veins as well as phenotypes with alteredleaf vein development. Annotations to this term are contributedby TAIR, NASC, and Gramene (Fig. 3A
). The user can obtainmore information about each gene or germplasm by clicking onthe name of the contributing database (Source), as shown forthe Arabidopsis YELLOW STRIPE LIKE 1 (YSL1) gene, annotatedby TAIR (Fig. 3B).
Functional annotation of a gene, which is an association betweena gene and a term in an ontology, summarizes information aboutits function at the molecular level, its biological roles, proteinlocalization patterns, and spatial/temporal expression patterns(Berardini et al., 2004). Generally, annotation tasks are carriedout at genomic databases, by manual or computational methods.All annotations contributed to the POC are composed manuallyby curators (biologists with an advanced degree) who eitherextract the information from published literature and generateconcise statements by creating gene-to-term associations (Berardiniet al., 2004; Clark et al., 2005) or record phenotype descriptionsdirectly by observing plants (natural variants and mutants)in greenhouses or in the field. Literature curation is usuallyconducted at species-specific genomic databases (TAIR, Gramene,and MaizeGDB). Curators at plant stock centers, such as NASC,Arabidopsis Biological Resource Center, and Maize Genetics CooperationStock Center, often combine their in-house description of germplasms,based on greenhouse observations and/or stock donor information,with information available from the literature. Each gene-to-termassociation is a separate annotation entry and a gene can beannotated with several ontology terms. For instance, the YSL1gene in Arabidopsis is annotated to multiple PO terms in TAIR(Fig. 3, B and C). YLS1 is expressed in male gametophyte, fruit,shoot, filament, sepal, petal, and leaf vein, with evidencecodes inferred from expression pattern (IEP) and inferred fromdirect assay (IDA), extracted from the publication by Jean etal. (2005). Evidence codes are defined types of evidence, whichare used to support the annotation. Most commonly used evidencecodes for annotating gene expression data and phenotypes areIDA, IEP, and inferred by mutant phenotype (IMP). In additionto the evidence code, TAIR provides evidence description, whichdepicts more specific assay types for supporting the annotations.For instance, YSL1 is expressed in the shoot, with evidencecode IEP and evidence description transcript levels (e.g. northerns;Fig. 3C). Details on evidence codes and evidence descriptionscan also be found online (http://www.plantontology.org/docs/otherdocs/evidence_codes.html).More details on literature curation using controlled vocabularyand components of annotations can be found elsewhere (Berardiniet al., 2004; Clark et al., 2005). Each contributing databasehas developed its own annotation interface and has taken differentapproaches to displaying gene and phenotype annotations. However,association files contributed to the POC Concurrent VersionsSystem repository are uniformly formatted and are compliantto POC standards.
Use of the PSO in Gene Expression and Protein Localization Experiments
Besides gene annotations, another common application of thePSO is in categorizing experiments and describing biologicalsamples. For example, databases containing large-scale geneexpression profiling data, such as GENEVESTIGATOR (Zimmermannet al., 2004) and NASCArrays (Craigon et al., 2004), are usingthe PSO to show genes that are expressed in certain plant structuresand to describe microarray experiments, respectively. The PlantExpression Database (Shen et al., 2005) is currently incorporatingPSO terms in their microarray experiment sample descriptionand also in their data submission forms (R. Wise, personal communication).Similarly, ArrayExpress plans to implement PSO terms in thenear future (H. Parkinson, personal communication). NASCarraysuses PSO terms to describe tissue sample sources used in microarrayexperiments (as BioSource Information; Supplemental Fig. S1).
Researchers can, and are encouraged to, use the PSO for describingtissue samples for various transcript analyses (e.g. northernblot/reverse transcription-PCR, -glucuronidase/green fluorescentprotein, in situ mRNA hybridization), protein localization experiments(e.g. immunolabeling, proteomic data), and gene expression assaysfrom microarray experiments or laser-capture microdissectionexperiments in their publications and Web sites. Descriptionsof other expression data, such as expressed sequence tags (ESTs)and cDNA libraries, can be enhanced by using proper botanicalterms and accession numbers from the PSO. These datasets aresubmitted to dbEST at the National Center for BiotechnologyInformation (NCBI) and consistent use of standardized anatomicalterms can greatly improve cross species comparison. For instance,a user interested in finding all ESTs from EST libraries generatedfrom pollen grains across plant taxa could query the NCBI GenBankusing the unique ID for the PSO term male gametophyte (synonym:pollen grain), PO:0020091, and retrieve all ESTs generated frompollen tissue samples. Currently, such a query is not feasibleat the NCBI; instead, a search for the words pollen AND plantretrieves all EST entries in which both words, pollen and plant,appear anywhere in the text. The Gramene database has alreadystarted using the PSO for tissue-type description of 201 ESTand cDNA libraries for cereals obtained from dbEST. The listof libraries and the links to the PSO terms can be viewed athttp://www.gramene.org/db/ontology/association_report?id=PO:0009011&object_type=Marker%20library.
In summary, the consistent use of PSO terms across differentplant species and use of available annotations of gene expressiondata and phenotype descriptions are valuable aids to bench scientistsand can facilitate new discoveries. Researchers involved inlarge-scale expression profiling projects or those who generatedmutant collections and are creating their own databases to storephenotypic data are encouraged to use the PSO. The POC has alreadybeen contacted by a number of such laboratories with questionson how to use the ontologies for describing tissue samples inEST collections, laser-capture microdissection experiments,microarray experiments, and mutant phenotype collections. Weare continuously making an effort to reach out to our prospectiveusers and to meet the particular annotation needs of the collaboratingdatabases, as well as the needs of the broader plant researchcommunity. Users are encouraged to contact the POC to get help,contribute their feedback, and suggest new ontology terms bywriting to [email protected].
Comparison of Gene Expression and Phenotype Data in Arabidopsis, Rice, and Maize
The data curated using the PSO, contributed by participatingplant databases, can be easily accessed by performing one-stopqueries in the POC database. As of August 31, 2006, the databasehas over 4,400 unique genes and nearly 1,900 germplasms annotatedwith PSO terms, with a total of over 10,000 associations, contributedby TAIR, Gramene, MaizeGDB, and NASC. Annotations are displayedand can be queried using the PO browser tool (http://www.plantontology.org/amigo/go.cgi),a modified AmiGO tool (see "Materials and Methods"). A userinterested in genes involved in inflorescence development andtheir comparison between grasses (rice and maize) and Arabidopsiscan search for the term inflorescence (PO:0009049) and retrieveall gene annotations and phenotypic descriptions associatedwith this term. Direct annotations to the PSO term and annotationsto all its children terms are displayed on the term detail page.Hyperlinks to the original publications from which annotationswere extracted provide quick access to the original experimentaldata and methodology, which, combined with a direct link tothe gene and locus detail pages at contributing databases, leadsto quick access to deposited DNA and protein sequences. Also,on the gene detail pages at Gramene and TAIR, functional annotationswith GO terms are displayed and hyperlinked to the GO, providingaccess from the PO to the GO through these links.
The gene expression data available at the POC Web site combinedwith sequence similarity and phylogenic analysis can facilitatecomparative structural and functional studies of related plantgenes. Although it is yet to be experimentally verified thatthe evolutionary conservation among plant genomes is manifestedby functional similarity, such as distinct overlapping expressionpatterns of orthologous genes, available annotations of geneexpressions can be used as a starting point in such studies.This approach can be particularly useful for orthologs in maizeand rice, considering their evolutionary relatedness (i.e. theirmonophyletic origin) and, to some degree, also for comparisonto their putative orthologs in Arabidopsis. A known exampleis the study of functional complementation and overlapping expressionpatterns of the vp1 gene in maize and its Arabidopsis orthologABI3, both genes involved in seed maturation and germination(Suzuki et al., 2001). ABI3 is expressed in the Arabidopsisembryo and seed coat (TAIR), whereas germplasm of the maizevp1 mutant is annotated to the PSO term fruit (MaizeGDB). Thus,the query for the term fruit, of which the seed is a part, usingspecies-specific filters for Arabidopsis and maize (availableon the PO browser) would retrieve all genes/germplasms annotatedin these two species, including vp1 and ABI3. Although the POdatabase does not yet have tools to address orthology or evensequence similarity in rice, maize, and Arabidopsis, annotationdata available at the POC Web site can be used as a startingpoint for detailed studies of the function and expression ofputative orthologous genes in rice and maize and their correspondinghomologs in Arabidopsis. Web sites such as InParanoid provideorthology information for sequenced eukaryote genomes (O'Brienet al., 2005) and could be used in combination with the POCto address these questions.
Extended Annotation of Mutant Phenotypes Using Controlled Vocabularies
Describing a phenotype is a complex task; to capture relevantbiological information about an entire set of characteristicsof an organism, one needs to consider all observable (measurable)traits, qualitative and quantitative, the type of assays, andspecific experimental conditions in which interaction of genotypeand environment occurs. Traditionally, curators at plant genomicdatabases have relied on the free-text description (usuallyas a short summary), often combined with images of mutant phenotypesand natural variants. This approach largely limits data manipulationand searches and prevents easy comparison across species.
PSO is an essential ontology to use to move toward more systematicannotation of phenotypes. However, it depicts the plant structuresonly during normal development of a plant. It does not includeterms that describe morphological variations of cells, tissues,and organs in mutated plants (e.g. fasciated ear) or qualitativeand quantitative descriptors (e.g. type of branching, trichomeshape, spikelet density). Thus, additional ontologies are requiredfor capturing relevant biological information about phenotypesfully. If used exclusively, the PSO would be insufficient tocapture all of the details of a phenotype in a controlled vocabularyformat.
Recently, the NASC, Gramene, and MaizeGDB moved toward combiningPO terms with other ontologies to annotate mutant phenotypesand natural variants to allow computation and more efficientcross species comparison. At Gramene, PO terms are used in conjunctionwith Trait Ontology terms (Yamazaki and Jaiswal, 2005) to describephenotypes. As an example, the phenotype description of theallele cg.1, cigar shape panicle (cg) gene in rice (Seetharamanand Srivastava, 1969; Prasad and Seetharaman, 1991) is shownin Supplemental Figure S2A. This mutation affects the morphologyof a panicle, rachis, and grain (see the text description inSupplemental Fig. S2); thus, the annotations were made to PSOterms inflorescence (PO:0009049), stem (PO:0009047), and seed(PO:0009010). In addition to PO terms, curators from the Gramenedatabase chose terms from another ontology, Trait Ontology (Yamazakiand Jaiswal, 2005), to annotate the cg.1 allele in rice: panicletype (TO:0000089), seed length (TO:0000146), seed size (TO:0000391),and stem length (TO:0000576).
A different approach has been taken by the NASC database fordescribing mutant phenotypes and natural variants in Arabidopsis.In addition to a free-text description, short statements, referredto as an entity, attribute, value (EAV) description, are composedby combining terms from orthogonal (i.e. nonoverlapping) ontologies.This model has been tested in pilot projects at a few modelorganism databases, namely, ZFIN (Sprague et al., 2003) andFlyBase (FlyBase Consortium, 2002). The EAV model relies onthe Phenotype and Trait Ontology (PATO)—a species-independentcontrolled vocabulary created as a schema in which the qualitativephenotypic data are represented as nouns and phrases (Gkoutoset al., 2005). The core of the PATO is composed of a set ofattribute and value terms (such as color, shape, and size; green,serrate, and dwarf), which are recently converted to a singlehierarchy of qualities (G. Gkoutos, personal communication).At the NASC database, the allele ckh1-1 (in Landsberg erectabackground), a mutation of the CYTOKININ-HYPERSENSITIVE 1 genein Arabidopsis, is annotated to the PO terms inflorescence (PO:0009049)and to the PATO term ShortHeight-Value (PATO:0000569), creatingthe following syntax: inflorescence:short:height. An additionalannotation to primary root (PO:0020127) is followed by ShortLength-Value(PATO:0000574), creating the syntax primary root:short:length.Thus, multiple controlled vocabulary statements can be createdfor any germplasm/seed stock.
Presently, the POC database and ontology browser are not set
up to display annotations to multiple ontologies. Therefore,
controlled vocabulary annotations to ontologies other than the
PO can be viewed on gene/germplasm/stock detail pages at contributing
databases, which can be accessed by clicking on the appropriate
database link (Supplemental Fig. S2B). More details on using
the Trait Ontology and the PATO and EAV model can be found at
the Gramene and NASC Web sites, respectively. Whereas Trait
Ontology is plant specific and was created for the purpose of
annotating mutants in rice and other cereal crops, PATO ontology
is species independent and intended for description of mutant
phenotypes across kingdoms. PATO terms can be used in combination
with a wide range of other ontologies that describe entities,
such as GO, Cell Ontology, and anatomical and developmental
stage ontologies, among others.