For each gene in the PBRC database, we provide an automated,
computer-driven annotation that gathers as much basic descriptive
information as possible about a gene, basic analysis of its
nucleotide or amino acid sequence, and the results of sequence
similarity searches to look for common patterns or features
that might be characteristic of its function. The annotation
process starts with the GenBank record and includes the descriptive
information, literature references and any other information
provided in that record. This information populates the initial
descriptive fields of our database. Following this automated
annotation process, a manual, human-directed curation of each
gene record is undertaken. During this curation process, a researcher
reviews the annotation record, all available literature references
and any unpublished information as available. This collection
of empirically derived properties for the protein in question
provides what might be considered a mini-review of the biology
of the gene being studied. The broad types of information that
are provided during the curation process include protein properties
such as molecular weight and pI; post-translational processing;
the availability of custom reagents such as clones, antibodies
and mutants; functional descriptions [including Gene Ontology
designations (
16)]; and literature summaries. Evidence codes
are provided that explicitly state the nature and source of
each piece of information along with the appropriate literature
references. A series of web forms assist in this process that
provides a distinct set of informational fields to be filled
in, and enforces use of a controlled vocabulary to fully describe
each gene. The results of the curation process are stored in
our SQL Server database and form a Poxvirus Knowledge Database
that is available and searchable from the PBRC website.