Compared to the GO and other anatomical ontologies, the PSOis a rather small ontology. The top-level term (also calledroot node), plant structure (PO:0009011), has 726 children terms(release PO_0906; Table I
), of which 384 (or 53%) are leafterms, also called terminal nodes (the most specific terms withno children terms below), and 342 (47%) interior nodes (termswith children). In addition, the PSO currently has 304 synonymsassigned to 149 terms. The relatively small size of the PSOreflects the generic nature of the ontology; often, the mostgranular terms are specific to taxonomic groups and are includedonly when necessary (i.e. to retain biological accuracy andto comply with annotation requirements). Having reached a balancebetween broadness and granularity, the PSO is a stable and inclusivevocabulary. All of the top nodes, with the exception of theinfructescence, are populated with necessary terms to describethe phenotypes and gene expression data in angiosperms thatare currently being annotated.
We analyzed the structure of the PSO and the distribution ofannotations to the PSO terms to assess the breadth, depth, andcurrent usage of the ontology. The depth of a term was definedas the number of nodes in the longest path from the root tothat term. Distribution of the depths of the terms in the PSOis shown in Figure 2A
. The mean and mode of the depth in theontology was 6.5 and 5, respectively, indicating that the majorityof the terms were fairly granular. The longest depth was 15,with the majority of the leaf terms (86%) having the depth betweenthree and 10 (Fig. 2A). To some extent, this variability isdue to the nature of the domain that the PSO describes (i.e.anatomy and morphology of an angiosperm). Certain morphologicalstructures of an angiosperm are more complex, resulting in deeperdepths (such as flower or leaf), whereas others are much simpler(such as male gametophyte and female gametophyte). The patternof distribution for terminal terms was similar to that for interiorterms.
The number and distribution of the annotations at differentdepths of the ontology are a measure of the usage of the ontology,indicating how adequate the depth of the ontology is for theannotations of gene expression data and phenotypic descriptions.Because annotation to the most granular terms is the ultimatecuration goal, we analyzed the current distribution of directannotations across the PSO and distribution of annotation toleaf terms (Fig. 2B). The majority of direct annotations (83%)are made to nodes with a depth between two and five nodes, indicatingthat terms with more granularity (with a path depth of sevenor more nodes) are less frequently used for direct annotations.Direct annotations to leaf terms are distributed between termsof depth between four and 11, with the exception of 405 annotationsto the top-level term whole plant (Fig. 2B). Because this termdoes not have any children, it appears as a terminal term inthe PSO at the first node. However, it is not a granular termand is excluded from further analysis. Only 155 leaf nodes,or 41% of total leaf nodes (excluding whole plant node), havedirect annotations (1,075 annotations), counting for 11% oftotal annotations to the PSO terms (Table I). Close to 90% ofthe annotations are made to nonleaf terms and the majority ofthe leaf terms are not currently used in annotations. This suggeststhat the granularity of the ontology seems to be sufficientfor the majority of the branches in the ontology. These datamay also be indicative of the extent of knowledge of gene expressionand phenotype characterization and could be further analyzedto determine which aspects of the ontology are less well studiedthan others. It is also possible that the distribution of theannotation reflects the extent of curation efforts in contributingdatabases and could be used to strategize directions in curationefforts. Finally, it may also reflect the current state of thetechnology used for gene expression data. Commonly availabletechnology for measuring gene expression data (e.g. microarraytechnology, northern blots, reverse transcription-PCR) are mostfrequently applied to organs and organ systems, which are high-levelterms in the ontology. This is not necessarily true for in-depthanalyses of mutant phenotypes, even though a large number ofphenotypic descriptions are generated in greenhouses or in thefield, where observations are made using limited tools. As newtechnologies become more available for plant researchers, suchas laser-capture microdissection, which allows for the procurementof specific cells of nearly any plant tissue, more granularterms in the PSO will likely be used for annotations.