Genome sequencing is ongoing at several institutions for a number of additional strains and isolates of pathogenic enterobacteria and ERIC will continue to incorporate new sequence data as it becomes available. We plan to continue our efforts of careful manual annotation for these organisms to provide high-quality information that is supported by direct experimentation. The number of genome sequences for these pathogens already available and the large number of new sequences anticipated suggests that manual inspection of every annotation is an impossible task. For this reason, we will focus annotation efforts on a few reference genomes for each group of pathogens as well as continue to carefully annotate protein families from the EnteroFams. Judicious application of the annotation propagation tool will be used to distribute these carefully curated annotations to other genomes.
Use of consistent vocabulary to describe biological entities and functions is critical for comparison of annotations within and between genomes. The Gene Ontology (GO) Consortium is a group dedicated to creating and applying a structured and controlled vocabulary for describing gene products, their functions and locations (12,13). The current annotations of genomes in ERIC contain limited use of the GO, and we plan to expand this in the future.
The use of high-throughput experiments to characterize bacterial genes, proteins and metabolites is increasing and ERIC will continue to integrate these types of data and provide tools for analysis and visualization. The ERIC portal is continually under development to improve data content and usability. The goal of integration of information within ERIC is to provide researchers with a simple-to-use, richly populated database of accurate genome annotation and associated data that will aid in creation of novel diagnostics, therapeutics and vaccines to mitigate the threats posed by pathogenic enterobacteria.