An ontology is a concise and unambiguous description of relevant
entities and their explicit relations to each other (Schulze-Kremer,
). Entities (terms) are linked by specific relationships,
with paths from more specific to more general terms upward in
the ontology tree. Thus, the information from one hierarchical
level can be propagated up to the next level, allowing users
to make inferences and to perform queries at different levels.
Each term in the ontology has a textual definition, an accession
number (identifier or ontology ID), and a specific relationship
to at least one parental term (Bard and Rhee, 2004
). The unique
identifier and relationships in the ontology are interpretable
by a computer, which makes possible computational processing
and retrieval of information associated with each term. For
example, lists of genes annotated to the terms in an ontology
can be compared and terms that are overrepresented in one list
over another can be determined using statistical tests. This
underscores their main appeal in the field of biology and, to
a large extent, explains the recent increase in the number of
bioontologies (Blake, 2004
The best known bioontology, the Gene Ontology (GO), was thefirst to offer a practical solution for describing gene productsin a human- and computer-comprehensible manner spanning diversetaxonomic groups (Gene Ontology Consortium, 2006; http://www.geneontology.org).The GO consists of three mutually independent ontologies; eachdescribes cellular components, biological processes, or molecularfunctions that occur in organisms. Over the years, the GO hasbecome a standard for describing functional aspects of geneproducts in a consistent way in various genomic databases. Followingthe GO paradigm and embracing the idea of generic, standardizedterminology that can be used across diverse taxonomic groups,the POC has largely adopted the ontology design model and rulesestablished by the GO consortium. However, the PSO is conceptuallydifferent and is governed independently from the GO. Some importantdifferences between the PO and GO are discussed in more detailbelow.
The PSO is the first multispecies ontology of plant anatomyand morphology. Its main purpose is to provide a standardizedset of terms describing plant structures—a tool for annotationof gene expression patterns and phenotypes of germplasms acrossangiosperms. Hence, this vocabulary is intended for a broadplant research community, including curators in genomic databases,bioinformaticians, and bench scientists. The PSO initially integratedexisting species-specific ontologies for Arabidopsis, maize,and rice; however, it is not intended only for a few model plantorganisms. Rather, we envision it as a continuously expandingontology that will gradually encompass crop species and woodyspecies. Recently, the ontology has been expanded to includeterms for Fabaceae, Solanaceae, additional cereal crops (wheat[Triticum aestivum], oat [Avena sativa], barley [Hordeum vulgare]),and poplar (Populus spp.), a model plant organism for woodyspecies.
A common set of criteria was established to ensure that thePSO would be biologically accurate and adequately meet practicalrequirements for annotation. Analysis of the three originalspecies-specific plant ontologies—predecessors of thePSO—greatly influenced our decisions on the rationaleand design for the PSO. Foremost, we defined the scope of thisontology to be limited to anatomical and morphological structurespertinent to flowering plants during their normal course ofdevelopment. Botanical terms, from the cellular to the wholeorganism level, are entities (i.e. terms [in italics in thisarticle]) in the PSO. Besides this main criterion for creatinga term, in some cases (following annotation requirements), wehave considered derivation (i.e. origin of plant parts and celllineages, as well as spatial/positional organization of tissues,organs, and organ systems of a flowering plant (e.g. leaf abaxialepidermis and leaf adaxial epidermis).
We established general rules for deciding when not to add termsto the ontology. To a great extent, qualifiers (or attributes)of the terms are avoided, and the ontology makes only very limiteduse of attributes. Thus, the term corolla is included, but theterms "sympetalous corolla" and "apopetalous corolla" are not.Attributes that are specific for describing mutant plants (e.g.wrinkled seed) are also excluded. Because it does not includeattributes, the PSO is insufficient as, nor is it intended tobe, a taxonomic vocabulary on its own and does not address phylogenyof angiosperms. Moreover, the most granular terminology in thePSO is at the cell-type level. Therefore, terms for subcellularcompartments are not included in the PSO. These terms are handledby the GO Cellular Component ontology. In addition, temporallandmarks (i.e. morphological and anatomical changes that occurvia developmental progression of organs and organ systems) areexcluded from the PSO; this aspect is a part of the Plant Growthand Developmental Stages Ontology (Pujar et al., 2006). Nonetheless,some temporal aspects are indirectly present in the PSO. Unlikein animal systems, most plant organs are developed in the postembryonicphase of the life cycle. Many plant structures develop continually,whereas others exist only temporarily; that is, at a particulartime during the life cycle. Structures that exist even in avery short period of time, such as a leaf primordium, are includedas terms in the PSO. For example, terms such as apical hook(defined as a hook-like structure that develops at the apicalpart of the hypocotyl in dark-grown seedlings in dicots) andleaf primordium (defined as an organized group of cells thatwill differentiate into a leaf that emerges as an outgrowthin the shoot apex) exist in the PSO. A leaf primordium is merelythe first visible appearance of a leaf and, therefore, bothterms, leaf and leaf primordium, describe the same entity (leaf)at different time points in development. There are genes thatare expressed in organ primordia, such as JAGGED and FILAMENTOUSFLOWER genes in Arabidopsis (both expressed in leaf, sepal,petal, stamen, and carpel primordia) with expression levelsdeclining in the developing or adult organs (Dinneny et al.,2004). To accurately annotate expression patterns of such genes,we created separate terms for each primordium structure. Currently,the PSO has 11 such terms.
To integrate terms from different species, we extensively used
synonymy wherever feasible. This allows users to search existing
plant databases using either a generic term or its taxon-specific
synonyms. For example, silique
, and kernel
as synonyms of the term fruit
. Therefore, a search for fruit
in the PO database would retrieve all genes expressed in the
silique of Arabidopsis, caryopsis of rice, and kernel of maize.
In reality, silique, caryopsis, and kernel are types (classes)
of fruit, rather than strict synonyms. However, for the purpose
of this ontology, specific types of a few high-level terms (e.g.fruit
, and stem
) are included only as synonyms.
Thus, we intentionally overlooked an enormous morphological
diversity of flowering plants in favor of cross species comparisons,
generic searches, and intuitive ontology browsing. Therefore,
synonyms in the PSO can be taxon-specific morphological forms
of a generic structure. Also, an entity in the PSO can either
be a term or a synonym, but not both. In a few cases where synonymy
was not a suitable option, we created new terms as specific
classes. Typical examples are the terms tassel
and pistilate inflorescences specific to the genus Zea, respectively.
In addition to the synonyms described above, the PSO contains
a number of terms that have authentic (exact) synonyms. Examples
include the terms male gametophyte
(synonym: pollen grain
(synonym: embryo sac
), or perisperm
). Extensive use of synonymy in the PSO resulted
in reduced granularity (i.e. the degree of detail in the ontology)
and emphasized generic aspects of the ontology. As a rule, a
high level of granularity was limited in the PSO because we
strove to keep the ontology relatively simple, yet sufficiently
broad and generic to encompass a number of flowering plants.