ERIC–ASAP GENOME ANNOTATIONS
ERIC provides access to continuously updated genome annotations for all ERIC pathogens, as well as information from a variety of other enterobacteria useful for reference and comparison, including E. coli K-12 (Table 1). ERIC uses the ASAP genome annotation database system (2) using an Oracle 10 g database for genome annotation and curation. ERIC–ASAP permits database updates continuously, obviating the need for periodic database releases that are a common feature of many genome databases. There are three general types of user accounts available for genome annotation purposes. Administrator accounts permit users the full range of capabilities including the ability to create new genome projects in the system. Curator accounts give users the ability to update ERIC annotations using sophisticated web-based interfaces for manual annotation and curation of information as well as tools for uploads of large sets of annotation data. Annotator accounts provide users with interfaces for manual annotation of individual annotation records. The annotation interfaces are all web-based and can be accessed by any member of the research community that requests an account. The availability of three different types of user accounts is designed to meet the needs of different types of annotators and to encourage training in use of the annotation tools that can be used to update large numbers of annotation records at a time. Genomes in ERIC can be either ‘public’ or ‘private’ projects, with users assigned to any of the three types of user accounts. All ‘public’ genome sequence data and annotations, including any newly added information, are accessible without an account.
Our goal is to provide genome annotations that are accurate, detailed, up-to-date and consistent across genomes. Descriptions of the standard operating procedures (SOPs) used by the ERIC curators are available for download from the portal (http://www.ericbrc.org/portal/eric/aboutasap). Every annotation record includes a description of the evidence supporting the data, and this is the primary way we assess the quality of the annotation information and measure improvements over time. Explanations of the evidence codes and how they are used can be found in the SOP describing gene annotation (http://www.ericbrc.org/portal/eric/sopCdsAnnotation). ERIC–ASAP is open for contribution by the research community to encourage annotation by domain experts. An additional layer of quality control is provided by a ‘curation status’ tag for each annotation that indicates whether the information has been independently approved by one of a select group of trusted users and dedicated curators.
Sequences and annotations in ERIC can be downloaded in a variety of formats including GenBank flatfile format and GFF3. Files downloaded directly from ERIC reflect continuous updates by the dedicated curatorial staff as well as community-contributed annotations. Snapshots of sequences annotated de novo by ERIC are also deposited in GenBank. Examples include the genome of Y. pestis strain CA88-4125 (GenBank accession number ABCD00000000) and plasmid pMAR7 from enteropathogenic Escherichia coli (3). ERIC is working toward an efficient mechanism for updates of existing GenBank and/or RefSeq records regardless of historical constraints. However, users should be aware that while ERIC provides support for documenting evidence for each individual line of annotation, this is not currently supported by NCBI.