The GSO is meant to link genetic and molecular information along the ontogenetic trajectory of plant growth, from germination to senescence in developmental time and space. Development is the execution of the genetic program for the construction of a given organism. The morphological structure is the product of many hundreds or thousands of genes that must be expressed in an orchestrated fashion to create any given tissue, body part, or multicellular structure (Davidson, 2001). Development is thus the outcome of a vast network of genes whose expression is regulated both spatially and temporally. Suites of genes are expressed only during specific times during the life cycle of a plant, while other genes are turned on and off intermittently throughout the life cycle. Effective annotation of growth stage-specific gene markers in plant genome databases requires the development and use of ontologies, such as the GSO described here. Many genetic and developmental studies are initially conducted using a specific model system that is rich in genomic resources, but validation of hypotheses often depends on investigation of multiple plant systems (Cullis, 2004). Incorporation of information from multiple sources requires integration and synthesis of data across species and database boundaries. The use of common terminology to describe homologous features in diverse species is the first step. Inclusion of synonyms for growth stages of every plant species offers an effective solution for the immediate term, but may become unwieldy in the future. It is analogous to the approach taken by the WORDNET project that defines words using sets of synonyms and currently covers 150,000 English words (Fellbaum, 1998). We are working with our software developers to provide tools that will categorize synonyms, eventually helping the user community to find the GSO terms that qualify as the growth stage terms for the plant species of their choice and automate the process of identifying derivative synonyms that can be queried in multiple ways. For example, a user may want to query on the terms sixth leaf/six leaves/6 leaves, all of which are derivatives of each other. Improvements in developer's tools will help prevent the ontology from becoming unwieldy and will greatly improve the efficiency of searches.
The GSO will also be valuable in describing high-throughput experimental designs, where plant development is typically analyzed using global patterns of gene expression at defined developmental stages (Schnable et al., 2004). We further anticipate that the design of an experiment is likely to influence the potential to conduct comparative analyses. For example, a problem may arise when a normalized set of tissue samples, e.g. from leaf tissue harvested at the three-, six-, and 10-leaf stages, is used to isolate a protein sample for a proteomics experiment or mRNA for either the microarray experiment or for constructing an expressed sequence tag/cDNA library. Unless each sequence from the library is associated with a particular source tissue and growth stage, it is very difficult to ascertain the actual growth stage at which the mRNA was expressed. Further in the PSO and GSO annotations it is not necessary that one gene is associated with only one plant structure and growth stage description. There can be multiple annotations to accommodate the necessary information about an expression profile, e.g. an expressed sequence tag accession can be expressed in leaf tissue at both the three- and 10-leaf stage but it may not be detected in the six-leaf stage. Hence, the use of well-defined GSO would be extremely useful to provide a framework for comparing gene expression patterns analyzed at different stages within and across species.
The generic design of the GSO aims to facilitate the process of integrating genomic information from diverse plant systems to deepen our understanding of plant form and function. Adoption of the ontology will contribute to its continued improvement and development and will promote an increasingly global view of plant biology. Members of the POC have used the emerging GSO to annotate genes and phenotypes in plants. As proof of concept, data associations from TAIR and Gramene are already available and users can now search over 600 annotated genes, updated on a monthly basis. The Gramene database (Jaiswal et al., 2006) will display the cereal GSO together with the GSO and eventually retire the cereal GSO, giving transition time for its users to familiarize themselves with the new terms. A similar approach will be taken by TAIR (Rhee et al., 2003), and MaizeGDB (Lawrence et al., 2005) is currently testing their annotations. Initially, emphasis was focused on the core databases but expanding use of the ontology by Soybase collaborators Rex Nelson and Randy Shoemaker and SGN collaborators Naama Menda and Lukas Mueller highlights its utility for comparative genomics. Soybase has adapted the GSO for description of soybean (Glycine max) data. SGN adapted the GSO for taxonomic family-wide description of Solanaceous plants and is currently testing it for tomato mutant description. In subsequent releases associations to maize and tomato will become available in the PO database, followed by soybean.
As our understanding of the gene networks and underlying molecular details regarding the origin and diversification of complex pathways such as flowering time grows, a challenge is presented to test the ability to place this knowledge into a framework that can accommodate the information as it emerges and place it into an appropriate comparative context. Similarly our current understanding of genetics and evolution in plants raises many questions about orthology, paralogy, and coorthology in diverse species (Malcomber et al., 2006). The functional relationships among these genes and gene families will be reflected in databases that annotate such information using precise morphological terms from the GSO and the PSO. The effective use of controlled vocabularies also helps identify problems and gaps in knowledge related to the curation of genes in different species where the evolutionary relationships are not entirely clear. Drawing from the experience of its core databases, the POC in the future will address the above issues by preparing and sharing annotation standards that can be used by other member databases to the benefit of the larger plant science community.
The current GSO design is based on annual plants, therefore discussions are underway with collaborators representing the poplar (Populus spp.) and citrus research communities to expand it to include perennials. We also hope that future software developments will allow us to hard wire temporal relationships into the ontology. We encourage databases and individual researchers to contact us if they are suggesting new terms, modification of existing definition(s), term-to-term relationships, or even interested in joining the POC by contributing the associations to their genes and mutant phenotypes by writing an e-mail to [email protected]
. More information about joining POC can be found online (http://www.plantontology.org/docs/otherdocs/charter.html).