Welcome to biology-online.org! Please login to access all site features. Create an account.
Log me on automatically each visit
Plant growth stages are identified as distinct morphological landmarks in a continuous …
Biology Articles » Botany » Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant Biology » Results
The GSO was developed over a period of two years (2004–2006) by a team of plant biologists comprising systematists, molecular biologists, agronomists, plant breeders, and bioinformaticians. We worked to develop a set of terms to describe plant development from germination to senescence that would be valid across a range of morphologically distinct and evolutionarily distant species. Although the rate of addition of new terms to the GSO has slowed since its initial stages of development, it is still under active development as we refine the ontology in response to user input and feedback from database curators.
Currently, the GSO has a total of 112 active terms, each organized hierarchically (Figs. 1 and 2A ) and associated with a human-readable definition. Although we started with existing systems such as the BBCH (Meier, 1997) as well as the controlled vocabularies developed for Arabidopsis by TAIR, for rice, Triticeae (wheat, oat, and barley), and sorghum by Gramene database, and for maize (Zea mays) by MaizeGDB, the current version of GSO is quite distinct from its predecessors. We will first discuss the major design issues we dealt with during the development of the GSO, and then describe the structure of the GSO and its applications to real-world problems. The ontology terms, database, and gene annotation statistics provided here are based on the April, 2006 release of the POC database.
We chose to make use of the data model originally developed for the gene ontology (GO) to describe the GSO. This data model uses a directed acyclic graph (DAG) to organize a hierarchy of terms such that the most general terms are located toward the top of the hierarchy while the most specific ones are located at the bottom of the hierarchy (Figs. 1 and 2, A and B). Each parent term has one or more children, and the relationship between a parent and one of its children is named either IS_A to indicate that the child term is a specific type of the parent term, or PART_OF to indicate that the child term is a component of the parent term (Smith, 2004). For example, the reproductive growth and flowering terms are related by IS_A, because flowering is a type of reproductive growth. On the other hand, seedling growth is related to its children terms radicle emergence and shoot emergence by PART_OF, because seedling growth is comprised of the two processes of radicle emergence and shoot emergence (Figs. 1 and 2A).
Each term is given a unique accession number named PO:XXXXXXX where the series of X's is a seven-digit number (Fig. 2A). Accession numbers are never reused, even when the term is retired or superseded. Obsolete terms are instead moved to a location in the hierarchy underneath a term named obsolete_growth_and_developmental_stage (Fig. 2A). This ensures that there is never any confusion about which term an accession number refers to. Each term also has a human-readable name like seedling growth, a paragraph-length definition that describes the criteria for identifying the stage, and citations that attribute the term to a source database, journal article, or an existing staging system. Many terms also have a synonym list; these are described in more detail below.
Our choice of the GO data model was driven by numerous practical considerations, foremost of which was the fact that the data model is supported by a rich set of database schemas, editing tools, annotation systems, and visualization tools.
Naming of Plant Growth Stages
The next issue we dealt with was how to name plant growth stages. Although development in any organism is a continuous process, it is important to have landmarks that identify discrete milestones of the process in a way that is easily reproducible. Extant systems either name growth stages according to a landmark (e.g. three-leaf stage) or by assigning a number or other arbitrary label to each stage. We chose to define growth stages using morphological landmarks that are visible to the naked eye (Counce et al., 2000), because such descriptive terms are more intuitive, self explanatory to the users, and easy to record in an experiment. To minimize differences among species, we were able to describe many growth stages using measurements/landmarks that are in proportion to the fully mature state. For example, the inflorescence stages are described in progression starting from the "inflorescence just visible," "1/4 inflorescence length reached," "1/2 inflorescence length reached," to "full inflorescence length reached." This provides an objective measurement of the degree of maturation of the inflorescence in a way that is not dependent on the absolute value of the inflorescence length.
Because the GSO crosses species and community boundaries, we needed to acknowledge the fact that each community has its own distinct vocabulary for describing plant structures and growth stages. To accommodate this, we made liberal use of the GO data model's synonym lists, which allow any GSO term to have one or more synonyms from species-specific vocabulary that are considered equivalent to the official term name. The GSO currently contains 997 synonyms taken from several plant species. On an average there are about nine synonyms per GSO term (Table I ). Like terms, we attribute synonyms to the database, literature reference, or textbook from which they were derived.
By using synonyms, we were able to merge 98% of terms from the various species-specific source ontologies into unambiguous generic terms. In a few cases we encountered identical terms that are used by different communities to refer to biologically distinct stages. We resolved such cases by using the sensu qualifier to indicate that the term has a species-specific (not generic) meaning. One example of this, described in more detail later, is "inflorescence visible" (to the naked eye) versus "inflorescence visible (sensu Poaceae)." In most plants the inflorescence becomes visible to the naked eye soon after it forms, whereas in Poaceae (grasses), the inflorescence only becomes visible much later in its development, after emergence from the flag leaf sheath.
Less satisfactory is the design compromise that we reached to represent the spatial and temporal ordering of terms. The existing plant growth scales are organized by the temporal progression of developmental events. However, the GO data model presents unique challenges in designing an ontology that represents the temporal ordering of terms across multiple species that display small but key variations in that ordering.
In particular, the GO data model does not have a standard mechanism for representing organisms' developmental time lines. This has forced each organism database that has sought to represent developmental events using the GO model to grapple with the issue of representing a dynamic process in a static representation. Some animal model organism databases, such as WormBase for Caenorhabditis elegans, Flybase for Drosophila, Zfin for Zebra fish, have developed developmental stage ontologies (OBO, 2005) in which temporal ordering is represented using either the DERIVED_FROM, DEVELOPS_FROM, or OCCURS_AT_OR_AFTER relationship to indicate that one structure is derived from another or that one stage follows another. However, we found these solutions to be unworkable for the GSO because of the requirement that the ontology must represent growth stages across multiple species. For example, consider the process of main shoot growth. In the wheat plant, main shoot growth may be completed at the nine-leaf stage, while in rice and maize, shoot growth may be completed at the 11- and 20-leaf stages, respectively, and this varies with different cultivars/germplasms (Fig. 3 ). Transition to the subsequent stage of reproductive growth is thus staggered for each species, and cannot be accurately described by an ontology in which each stage rigidly follows another.
As an example of how this works, we describe the stages of leaf production using terms named "LP.01 one leaf visible," "LP.02 two leaves visible," "LP.03 three leaves visible," and so forth. When displayed using the ontology web browser, the terms appear in their natural order (Fig. 2B). However, there is nothing hard wired into the ontology that indicates that LP.01 one leaf visible precedes LP.02 two leaves visible.
A related issue is the observation that during plant maturation, multiple developmental programs can proceed in parallel. For instance, the processes of leaf production and stem elongation, although coupled, are temporally overlapping and can proceed at different relative rates among species and among cultivars within a species. We represent such processes as independent children of a more generic term. In the case of the previous example, both leaf production and stem elongation are represented as types of main shoot growth using the IS_A relationship (Figs. 1 and 2A).
Description of the Ontology
The four main divisions of the GSO are "A_Vegetative growth," "B_Reproductive growth," "C_Senescence," and "D_Dormancy" (Fig. 2A). As described earlier, the alphabetic prefix is there to force these four divisions to be displayed in the order in which they occur during the plant's life cycle in general. The substages of "vegetative growth" are "0_Germination," "1_Main Shoot Growth," and "2_Formation of Axillary Shoot," while the substages of "reproductive growth" are "3_Inflorescence Visible," "4_Flowering," "5_Fruit Formation," and "6_Ripening." Neither "C_Senescence" nor "D_Dormancy" currently has substages beneath them. Again, the numeric prefixes are there only to make the substages appear in a logical order. Each of the substages has multiple, more specific stages beneath it.
Although the BBCH scale (Meier, 1997) was the starting point for the GSO, we have diverged from it in many important aspects. A major difference is the number of top-level terms (Figs. 1 and 2A). The BBCH scale has 10 principle stages as its top-level terms, but the GSO only has four. We collapsed four BBCH top-level stages (germination, leaf development, stem elongation, and tillering) into our top-level vegetative growth term, and collapsed another six BBCH top-level terms (booting, inflorescence emergence, flowering, fruit development, and ripening) into reproductive growth. We felt justified in introducing the binning terms vegetative growth and reproductive growth for several reasons; (1) to help annotate genes that act throughout these phases; (2) persistent use in current scientific literature, especially when the specific stage of gene action or expression remains unclear; and (3) they were requested by our scientific reviewers to enhance the immediate utility of the ontology.
We now look in more detail at some of the more important parts of the ontology.
Germination (PO:0007057)This node in the GSO has eight children that are broadly applicable to seed germination. The stages under "seedling growth" and "shoot emergence" are not given numerical prefixes, as it is not clear which event precedes the other among the various species. Only events of seed germination were considered in this ontology, whereas the BBCH scale equates seed germination with germination of vegetatively propagated annual plants and perennials such as bud sprouting. The two processes are in fact quite distinct in terms of organs developing at this stage, the physiology, and various metabolic processes, and thus we felt that combining them was inappropriate.
Main Shoot Growth (PO:0007112)This refers to the stage of the plant when the shoot is undergoing rapid growth. It can be assessed in different ways depending on the species and the interests of the biologist. Plants may be equally well described in terms of leaves visible on the main shoot or in terms of the number of nodes detectable (Zadok et al., 1974), and biologists studying Arabidopsis commonly assess the size of the rosette. To accommodate existing data associated to these terms we created three instances of main shoot growth, namely the "leaf production," "rosette growth," and "stem elongation," with a strong recommendation to use "leaf production" wherever possible.
Leaf Production (PO:0007133)Leaves are produced successively so that the progression through this stage can be measured by counting the number of visible leaves on the plant (Figs. 2B and 3). In any species, leaves are always counted in the same way (Meier, 1997; described in detail later). In plants other than monocotyledons, leaves are counted when they are visibly separated from the terminal bud. The recognition of the associated internode (below) follows the same rule (Fig. 3). Leaves are counted singly unless they are in pairs or whorls visibly separated by an internode, in which case they are counted as pairs or whorls. In taxa with a hypogeal type of germination, the first leaf on the epicotyl is considered to be leaf one and in grasses the coleoptile is leaf one.
In the GSO the stages of leaf production continue up to 20 leaves/pairs/whorls of visible leaves (Fig. 2A), but this can be emended to accommodate higher numbers, as new species are included. This is unlike the BBCH scale (Meier, 1997), where only nine leaves can be counted and all the rest would be annotated to nine leaves or more. This was done to accommodate the leaf development stages of maize, where depending on cultivars the number of leaves can be few as five or have 20 or more leaves. The maize community and the MaizeGDB database (Lawrence et al., 2005) use a modified version of Ritchie's scale (Ritchie et al., 1993) in which the stages of the maize plant are measured solely by counting the leaves from the seedling through the vegetative stages, and the nodes are not counted.
Stem elongation can be assessed by the number of visible nodes; this metric is commonly applied to the Triticeae, for which the Zadok's (Zadok et al., 1974) or BBCH scales (Meier, 1997) were originally developed. Stem elongation begins when the first node becomes detectable. This is usually equivalent to node number seven (the number varies in different cultivars), since earlier nodes are not detectable before elongation commences in the grasses. Boyes et al. (2001) considers Arabidopsis rosette growth analogous to stem elongation in the grasses, and uses leaf expansion as the common factor linking the rosette growth and stem elongation stages. In our model, "rosette growth" (PO:0007113) and "stem elongation" (PO:0007089) are treated as separate instances of sibling stages (Figs. 1 and 2A), mainly to provide language continuity for users, rather than for biological reasons.
Reproductive Growth (PO:0007130)Reproductive growth and its child terms are organized a little differently from vegetative growth. Reproductive growth has four instances: "inflorescence visible" (PO:0007047), "flowering" (PO:0007026), "fruit formation" (PO:0007042), and "ripening" (PO:0007010; Fig. 2A). The "inflorescence visible (sensu Poaceae)" (PO:0007012) specific to grass family is an instance of the generic term "inflorescence visible" (PO:0007047). This in turn has two instances: "booting" (PO:0007014) and "inflorescence emergence from flag leaf sheath" (PO:0007041). As described earlier, the generic "inflorescence visible" stage is considered separate from "inflorescence visible (sensu Poaceae)," as the former includes all plants where inflorescence formation and visibility coincide, while in members of the Poaceae, many developmental events in the reproductive phase start during the vegetative phase but manifest themselves as visible morphological markers much later.
Other stages are similar in their organization to the existing scales, but as we continue including various species from families Solanceae and Fabaceae, we anticipate that changes in the organization may be required to accommodate them into the GSO.
The GSO terms are in a simple hierarchy that is intuitive to use. The GSO is a relatively small ontology and has a total of 112 terms, excluding the obsolete node. It has four top nodes, 15 interior nodes (terms associated with children terms), and 88 leaf nodes (terms without any children terms; Fig. 2A). New terms are added based on user requests after thorough discussions. A researcher can browse the GSO using the ontology browser available at http://www.plantontology.org/amigo/go.cgi. This is a Web-based tool for searching and browsing ontologies and their associations to data. It has been developed by the GO consortium (http://www.geneontology.org/GO.tools.shtml#in_house) and modified to suit our needs. To browse, clicking on the [+] sign in front of the term expands the tree to show children terms (Fig. 2A). This view provides information on the PO identification (ID) of the GSO term, term name, followed by a number of associated data such as genes. For every green-colored parent term a summary of the data associated to its children terms is presented as a pie chart. The user has an option to filter the number of associated data displayed based on species, data sources, and evidence codes. The icons for [i] and [p] suggest the relationship types between the parent and child term as described in the legend. While browsing, a user can click on the term name to get the details at any time (Fig. 4B ). The users will see the icon [d] for develops_from relationship type. This relationship type is used strictly in the PSO and not GSO. It suggests that a plant structure develops from another structure (Jaiswal et al., 2005).
Annotation is the process of tagging snippets of information to the genomic element by skilled biologists to extract its biological significance and deepen our understanding of the biological processes (Stein, 2001). The curator attributes the added information to its source by the use of evidence codes (http://www.plantontology.org/docs/otherdocs/evidence_codes.html) indicating the kind of experiment that was carried out to infer the association to a GSO term, such as inferred from expression pattern (IEP) involving northern, western, and/or microarray experiment, or inferred from direct assay (IDA) such as isolated enzyme and/or in situ assays, etc. (Table II). The user interface has query filter options to search for genes annotated with a given type of evidence code. Explicit spatiotemporal information related to the whole plant is extracted from literature by a curator and described using terms from the GSO. The current build of the GSO has over 600 genes associated to it from the TAIR and Gramene databases (Fig. 5A ). Analysis of the data at this point may not be entirely reflective of current research in Arabidopsis and rice, as manual curation is a dynamic and evolving process and will necessarily lag behind the actual state of research. In TAIR, about 130 gene associations to whole-plant growth stages carry the evidence code IEP, while in the Gramene database, a majority of GSO annotations (about 480) carry the evidence code inferred from mutant phenotype (IMP) and a smaller number of IEP and inferred by genetic interaction (IGI) associations. A closer look at the number of genes associated to various terms and their immediate parents (Fig. 5, A and B) reveals that many of these genes with GSO annotations in TAIR are associated to germination stages, which is a vegetative stage. Similarly the vegetative stages, particularly five- to six-leaf stages (children of leaf production) and reproductive stages, namely the inflorescence visible (sensu Poaceae), a child of inflorescence visible, fruit formation, and ripening stages in the rice plant are of particular importance (Fig. 5B). The Solanaceae Genome Network (SGN) has adapted the GSO and has created a mapping file for Solanaceae (tomato [Lycopersicon esculentum]) synonyms that is used to associate their data. Tomato mutants are initially being curated to these terms and, predictably, a large number of mutants will be associated to the ripening stages (data not shown). As we continue to solicit data from collaborating databases and annotate using the GSO, we obtain a global view of how data is associated with different stages of plant growth (Fig. 5B).
This section provides examples of genetic analyses that typically use whole-plant ontogeny as a feature of the experimental design and data analysis. It indicates some of the difficulties of extracting spatiotemporal information from the literature and shows the advantages of curating genomic information using the plant ontologies (GSO and PSO), which allow the users to query when and where a gene is assayed, expressed, or its effects become visible during the life cycle of a plant. In addition the PO database supports queries such as what are the genes that are expressed during the germination stage in Arabidopsis and rice or show me all of the phenotypes in the reproductive stages of a rice plant when mutated.
Annotation Examples of Mutant Phenotypes
The primary description of phenotypic data is usually at the whole-plant level and it is rarely a straight-forward exercise of term-to-term association for the curator. For example, characterization of dwarf mutants is done in different ways, most often by the leaf or node number that is affected, counted either top down or bottom up; in this system the leaf and the internode below it can be used to define the same stage. This is distinct from node visible stages that are less reliable, as the first node that is visible is a variable number in grasses (Fig. 3).
An example is provided by recording of internode elongation, the main morphological feature that is affected in dwarf plants, is attributed among others to the effect of gibberellin and brassinosteroids (Chory, 1993; Ashikari et al., 1999). Yamamuro et al. (2000) show that brassinosteroid plays important roles in internode elongation in rice and have characterized dwarf mutants based on the specific internode that is affected. In the dn-type mutant all the nodes are uniformly affected (the total number of nodes in a given mature rice plant). However, in the nl-type mutant, only the fourth internode is affected, while in the case of the sh-type mutant, only the first internode is affected. However, in this case, the authors of the study number the internodes from top down (the uppermost internode below the panicle is the first internode). To be consistent with the GSO, these numbers have to be converted to the appropriate leaf/node counting from the base of the shoot (Fig. 3). This has to be achieved by the curator's personal knowledge of the plant, from legacy information available for the species and germplasm accession, or by contacting the authors. Unlike the above example, generally leaves are counted from below and the curator extracts information from statements such as when the plant is at the three-leaf stage. This permits an immediate visualization of the morphological appearance of the plant to the researcher and curator as well as the user (Fig. 3). Currently by using the IMP filter, more than 500 genes annotated to different growth stages are available in the PO database.
Cross-Database Comparison of Gene Annotations
Almost all organismal databases are mutually exclusive and provide little or no overlap in their schemas with other databases. Thus they cater to exclusive user communities. To illustrate how the use of ontologies can overcome database interoperability problems, we compare the related processes of flowering time in Arabidopsis and heading date in rice (Fig. 6 ). The gene network underlying the photoperiodic flowering response involves photoreceptors, circadian clock systems, and floral regulator genes (Yanovsky and Kay, 2002; Izawa et al., 2003; Putterill et al., 2004; Searle and Coupland, 2004). Interestingly, the molecular components that underlie the transition from vegetative to reproductive growth are conserved in Arabidopsis and rice (Hayama and Coupland, 2004; Putterill et al., 2004).
At present all the above genes are available in the PO database, annotated either or to both GSO and PSO terms (Table III ). The Arabidopsis databases, National Arabidopsis Stock Centre and TAIR, have used IMP, IEP, IDA, and traceable author statement (TAS) evidence codes to annotate GI, CO, and FT genes to the exact plant structure where they are expressed. Gramene database has used the IMP and IGI evidence codes to annotate OsGi, Se1 (Hd1), and Hd3a. For rice the IGI code was used to describe the epistatic interaction between Se1 (Hd1) and Hd3a. Table III also includes the annotation of the same genes to the GO. Although this information is not provided by the POC database, it can be retrieved by visiting the respective source databases TAIR and Gramene from the gene detail pages. The information on GO annotations further suggests the biochemical roles of these genes and their functional similarity or dissimilarity.
Cross-database querying is often difficult because of the way the stage of plant growth is described or the way a trait or phenotype is assayed and curated in species-specific databases. In Arabidopsis the time of flowering is indicated by the number of rosettes on a plant (Samach et al., 2000), while it is indicated by the number of days between planting (or transplanting) and heading of the primary panicle in rice (Yano et al., 2000). The phenology or growth stage studied in both plants is the same (appearance of reproductive structure), but the annotation typically used to identify that growth stage is very different. Once generic terminology describing plant phenology/growth stages is agreed upon and consistently utilized in database curation, these kinds of results will become more readily accessible with fewer queries.
Standard Growth Stage Vocabulary in Experimental Description and Design
Associated with the problem of database curation is the problem of data collection in laboratories and research groups, where data related to plant growth stages are typically collected based on chronological age alone such as 5 d after germination, 10 d after flowering, 1-month-old plant, leaf tissue was harvested in the spring of 2005, etc. The widely differing developmental timelines do not allow meaningful comparisons, even among members of the same species, particularly when environmental conditions vary. However, if critical studies can be performed on a few model genotypes from the same species across various environments they can serve as a reference. This kind of data has been described for 24 rice cultivars, including Nipponbare, Azucena, IR36, IR64, Koshihikari, etc. (Yin and Kropff, 1996), for 19 genotypes of maize, including B73, Mo17, hybrid B73xMo17, and 16 additional hybrids (Padilla and Otegui, 2005), and a comparative study including wheat, barley, and maize (McMaster et al., 2005). The overall outcome of all these studies suggested that although genotypes may differ in their growth profiles in terms of growth rate or flowering time as a result of environmental variables (i.e. light, temperature, or water-deficit conditions), the targeted vegetative growth stages recorded by counting the number of leaves almost always followed a predictable pattern for a given genotype. The responses to variables such as increase or decrease of growth rate or stem elongation, versus the leaf numbers, were not interdependent. This further proved that such experiments can be used by researchers to estimate the growth stage profile based on counting the number of leaves and that this estimate of growth stage was independent of the environment as long as the genotype is known. Thus, data collected with reference to a commonly defined series of whole-plant growth stages such as the ones described in the GSO will provide greater coherence and facilitate comparisons between and within species (Boyes et al., 2001).
rating: 1.60 from 5 votes | updated on: 17 Dec 2007 | views: 25274 |
share this article | email to friends
suggest a revision
print this page
print the whole article
© Biology-Online.org. All Rights Reserved. Register | Login | About Us | Contact Us | Link to Us | Disclaimer & Privacy