A common goal of current plant genomics research is to establish an expandable platform for global classification and analysis of plant gene family space. A large fraction of genes in plant genomes are the product of duplication and novel gene creation processes that have occurred within plants over their 500-million-year history. Gene classifications that attempt to capture all of eukaryote diversity typically provide a poor representation of plant gene sets. With more than a dozen plant genomes scheduled for completion over the next two years, and many additional genome and transcriptome projects being initiated, there is a need for flexible, gene family-focused databases that provide rich toolsets for comparative analyses of plant genomes. Comparative analyses of the modeled proteomes for sequenced genomes can help verify gene content and elucidate the process of gene duplication and functional diversification. Cross-validation of gene models for available plant genomes and nucleotide sequence translations of EST sets for other plant species can be achieved through clustering and similarity analyses involving whole-genome sequences and large EST sets [e.g. (3–5); TIGR Plant Transcript Assemblies, (6)].
The PlantTribes database is a global classification of genes from all of the five sequenced plant genomes: Arabidopsis thaliana, Carica papaya (papaya), Medicago truncatula (barrel medic, 60% sequenced), Populus trichocarpa (poplar) and Oryza sativa (rice). The database also contains unigene sets from the TIGR Plant Transcript Assemblies (6), which includes
4 million sequences from more than 200 species, that facilitates a wide range of comparative study of plant genes and gene families. PlantTribes offers a unique view of objectively defined gene families that facilitates comparative analyses of plant genomes. For example, our database allows one to identify all gene families of a given size in a species and quickly assess the range of copy numbers for closely related genes in other plant genomes. Families that have remained stable in size, or have proliferated greatly in one genome compared to another can easily be identified. In our own research, this type of analysis has aided interpretation of gene family stability and diversification in the face of gene and whole genome duplications (e.g. 24, 25, 30, 31). Integration of expression data, linked seamlessly to the tribe gene classification, will facilitate studies of expression divergence following gene duplication (e.g. 17). PlantTribes can aid comparative analyses by serving as a scaffold of gene families into which users can sort their genes of interest. We have devised search and query tools that allow users to access this information, making it possible to investigate the evolution of plant genomes through analysis of the scaffold itself and sequences sorted into the scaffold.