Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource
Abstract
Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource
Elliot J. Lefkowitz*,
Chris Upton1,
Shankar S. Changayil,
Charles Buck2,
Paula Traktman3 and
R. Mark L. Buller4
Department of Microbiology, University of Alabama at Birmingham, BBRB
276/11; 1530 3rd Avenue S., Birmingham, AL 35294-2170, USA, 1 Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada V8W 2Y2, 2 Virology Collection, ATCC, Manassas, VA 20108, USA, 3
Department of Microbiology and Molecular Genetics, Medical College of
Wisconsin, Room 273–BSB, 8701 Watertown Plank Road, Milwaukee, WI
53226, USA and 4 Department of Molecular Microbiology and
Immunology, St Louis University Health Sciences Center, 1402 South
Grand Boulevard, St Louis, MO 63104, USA
Abstract
The Poxvirus Bioinformatics Resource Center (PBRC) has beenestablished to provide informational and analytical resourcesto the scientific community to aid research directed at providinga better understanding of the Poxviridae family of viruses.The PBRC was specifically established as the result of the concernthat variola virus, the causative agent of smallpox, as wellas related viruses, might be utilized as biological weapons.In addition, the PBRC supports research on poxviruses that mightbe considered new and emerging infectious agents such as monkeypoxvirus. The PBRC consists of a relational database and web applicationthat supports the data storage, annotation, analysis and informationexchange goals of the project. The current release consistsof over 35 complete genomic sequences of various genera, speciesand strains of viruses from the Poxviridae family. Sequenceand annotation information for these viruses has been obtainedfrom sequences publicly available from GenBank as well as sequencesnot yet deposited in GenBank that have been obtained from ongoingsequencing projects. In addition to sequence data, the PBRCprovides comprehensive annotation and curation of virus genes;analytical tools to aid in the understanding of the availablesequence data, including tools for the comparative analysisof different virus isolates; and visualization tools to helpbetter display the results of various analyses. The PBRC representsthe initial development of what will become a more comprehensiveViral Bioinformatics Resource Center for Biodefense that willbe one of the National Institute of Allergy and Infectious Diseases'‘Bioinformatics Resource Centers for Biodefense and Emergingor Re-Emerging Infectious Diseases’. The PBRC websiteis available at http://www.poxvirus.org.
Introduction
An effective response to the use of biological organisms as
agents of terrorism or warfare, or to the emergence of new infectious
diseases requires a multi-disciplinary effort involving various
agencies at the local, state and federal levels including public
health officials, hospital personnel, epidemiologists and the
military. In addition to the public health response, a concerted
research effort is necessary to better detect, understand and
respond to these threats. Such research requires development
of environmental detectors and clinical diagnostic aids to provide
us with rapid warning in the event of an outbreak as well as
development of vaccines to prevent infection and antiviral or
antibacterial drugs to cure infection. These efforts require
a comprehensive biological understanding of potential threat
agents, including their molecular biology, genetics, pathogenicity,
epidemiology and evolution. The National Institute of Allergy
and Infectious Diseases as well as the US Centers for Disease
Control and Prevention maintain a list of priority pathogens
that are considered potential bio-threat agents and/or are microbes
that appear to be new or reemerging pathogens (http://www.bt.cdc.gov/agent/agentlist-category.asp
and http://www.niaid.nih.gov/biodefense/bandc_priority.htm).
Variola virus, the causative agent of smallpox and a member
of the
Poxviridae family of viruses, has perhaps the greatest
potential for use as a bio-weapon and is one of the Category
A pathogens on these priority pathogen lists (
1). In addition,
monkeypox virus, a member of the orthopoxvirus genus that includes
variola virus, has caused a number of disease outbreaks in recent
years, including outbreaks in North America resulting from the
importation of rodents from Africa intended to be sold as pets
(
2,
3).
The use of high-throughput DNA sequencing techniques as well
as other large-scale ‘Systems Biology’ technologies
have led to an unprecedented increase in the amount of available
data. Therefore, one overarching necessity in research efforts
directed at providing a better understanding of priority pathogens
is the need to collect, manage, describe, analyze and publicize
the vast amounts of information generated by modern, high-throughput
biological research. Therefore, the goal of the Poxvirus Bioinformatics
Resource Center (PBRC) is to organize all available information
on virus genetics thus aiding research efforts towards increasing
our knowledge of virus replication and virus–host interaction
on a gene-by-gene and whole genome basis. In addition, the PBRC
is expanding on available knowledge by developing and utilizing
analysis tools that can further probe the information contained
in the genome and gene sequences of these organisms. Since our
goal is to establish an information resource to support research
efforts by the scientific community, we are also soliciting
input from that community to ensure the completeness and, above
all, the accuracy of the information being provided and to ensure
that the software tools provided and in development reflect
the needs of the different research groups using these resources.
What is the poxvirus bioinformatics resource center?
The PBRC represents a cooperative endeavor with collaborations
between the University of Alabama at Birmingham, the University
of Victoria, the Medical College of Wisconsin, St Louis University
and the American Type Culture Collection (ATCC). The PBRC consortium
has established a database of all available completely sequenced
poxvirus genomes. This database includes information on every
predicted gene that may be coded for by these genomes, as well
as descriptive annotations of the physical and functional properties
of each gene based on computer predictions. Currently, only
complete genomes are included in the database. In future releases
we plan to include as available, incomplete genomes as well
as coding sequences from virus strains not represented by complete
genomic sequences. We are also compiling a comprehensive gene-by-gene
curation that is linked to each individual gene record. In addition,
the PBRC provides a variety of analytical information and analysis
tools that can be used to mine the genomic data. These tools
include sequence homology searches, a database of functional
domains, a database of poxvirus gene orthologs and web-based
visualization tools to allow for customized displays of much
of this information. A major goal has been to develop new software
packages for the analysis of viral genomes. This aspect of the
project involves designing new software tools that permit researchers
to interact with and manipulate complete poxvirus genomes and
families of poxvirus protein orthologs. This vastly speeds up
the analysis process and provides information about these viruses
from comparative analyses that otherwise would be almost impossible
to obtain. Some of these tools have been described previously
in the literature and therefore more substantial descriptions
are already available [Poxvirus Orthologous Clusters (
4,
5),
Viral Genome Organizer (
6), Viral Genome Database (
7), JDotter
(
8) and Base-By-Base (
9)].
Database description
The basic PBRC web portal accesses genomic, annotative and analytical
information from a Microsoft SQL Server database. A companion
MySQL database provides data to a number of java-based analytical
tools. The database schema used for the PBRC originally was
developed to accommodate bacterial genomes and their genes (protein
and RNA) (
10,
11) and needed only slight modification to support
the storage of information on poxvirus genomes and genes (
12).
The PBRC data schema provides tables to store sequences and
basic sequence annotations, human-annotated curation records
and tables for the storage of basic analytical information such
as the results of BLAST searches, functional motifs and biophysical
properties. The database supports all of the web-based query
tools and also serves as the data source for all of our web-based
analytical tools.
Genome data acquisition
Genomic data are obtained from GenBank (
13), other publicly
available databases and websites, as well as from ongoing sequencing
projects. The current PBRC release consists of over 35 complete
genomes from the
Poxviridae family of viruses. If available
from the GenBank record or from the sequencing laboratory, the
predicted gene set of any one genome is stored in the PBRC database
and used as the initial starting point for annotation and curation
of each genome. If a predicted gene set is not available, we
perform our own gene prediction using a combination of gene
predictive tools that start with open reading frames, and then
include among others, promoter prediction, presence of functional
motifs and assignment to orthologous protein clusters. Our experience
is that the methods used for the prediction of protein coding
genes for each poxvirus genome present in GenBank has in many
cases been derived using different parameters. Therefore, we
are in the process of developing a more consistent method for
the prediction of poxvirus genes and will utilize this gene
prediction pipeline to reassess the gene set for each available
genome.
Each virus genome is categorized according to its genus, species
and strain designation as determined by the International Committee
for the Taxonomy of Viruses (ICTV) (http://www.ncbi.nlm.nih.gov/ICTVdb/)
(
14,
15). In addition to providing the nomenclature available
from the GenBank record for each gene in any one genome, we
also provide our own gene designation which is a combination
of the ICTV-approved species name, the strain or isolate name
and a numerical designation for each gene that starts at number
1 for the left-most gene, and is incremented by one for each
subsequent gene as determined by its genome position.
Genome annotation and curation
For each gene in the PBRC database, we provide an automated,
computer-driven annotation that gathers as much basic descriptive
information as possible about a gene, basic analysis of its
nucleotide or amino acid sequence, and the results of sequence
similarity searches to look for common patterns or features
that might be characteristic of its function. The annotation
process starts with the GenBank record and includes the descriptive
information, literature references and any other information
provided in that record. This information populates the initial
descriptive fields of our database. Following this automated
annotation process, a manual, human-directed curation of each
gene record is undertaken. During this curation process, a researcher
reviews the annotation record, all available literature references
and any unpublished information as available. This collection
of empirically derived properties for the protein in question
provides what might be considered a mini-review of the biology
of the gene being studied. The broad types of information that
are provided during the curation process include protein properties
such as molecular weight and pI; post-translational processing;
the availability of custom reagents such as clones, antibodies
and mutants; functional descriptions [including Gene Ontology
designations (
16)]; and literature summaries. Evidence codes
are provided that explicitly state the nature and source of
each piece of information along with the appropriate literature
references. A series of web forms assist in this process that
provides a distinct set of informational fields to be filled
in, and enforces use of a controlled vocabulary to fully describe
each gene. The results of the curation process are stored in
our SQL Server database and form a Poxvirus Knowledge Database
that is available and searchable from the PBRC website.
Website
The PBRC website is provided by a Microsoft Windows 2003 serverrunning Microsoft Internet Information Services (IIS). The userinterface is provided through a combination of Active ServerPages running server-side Visual Basic script, client-side JavaScriptand HTML. SQL Server database access is provided through MicrosoftActiveX Data Objects.
The starting point for access to all databases and tools availablefrom the PBRC is the PBRC home page at http://www.poxvirus.org/.A menu is displayed along the top of the page and appropriateclicking on a menu link brings up a submenu that provides accessto individual web query tools and/or applications. The Datamenu provides access to search forms that provide user-specifiedqueries for genome, gene and sequence data (Figures 1 and 2).Analytical tools are provided through another series of submenusand provide access to a series of analytical and visualizationtools (Figure 3). In addition to the main menu and context-sensitivesubmenu, a menu near the bottom of the page provides accessto PBRC organizational information such as descriptions of thepeople and teams involved in the work, acknowledgements anda form to provide user feedback. At the very bottom of the formwe provide a Google-supported search form.
In general, available web pages can be categorized as follows:
- Informational: Static text such as help files, biographicalinformation or meeting announcements.
- Forms: Pages that provideuser input for data queries.
- Tabular results: Pages displayingdata in tables.
- Graphical results: Pages displaying data resultsas figures.
- Links: Pages providing access to other pages.
- Applets: Small Java-based applications for data analysis andvisualization that are available from within a user's web browser.
- Applications: More comprehensive Java-based analytical applicationsthat can be run from a web page using Java Web Start, or downloadedto a user's workstation and run as a stand-alone application.
Analytical tools
BLAST similarity searches (
17) are available that provide standard
web-based BLAST output as well as a web-based version of our
BLAST parsing and visualization application, XS-BLAST (XML-SQL
BLAST). XS-BLAST utilizes the XML-output option of the NCBI
BLAST executable, and parses the results storing them in our
SQL Server database. The result set is then provided to the
user as an HTML table, as well as from a Java-based graphical
visualization tool (
Figure 3). Both BLAST searches are run locally
on a Sun Solaris server and provide several customized databases
for searching. These include all complete genomic
Poxviridaenucleic acid sequences, all predicted
Poxviridae protein sequences
and all available
Poxviridae nucleotide sequences extracted
from GenBank.
Comparative analysis of the coding potential of all
Poxviridaegenomes is available as both pairwise and multi-genome comparisons.
Both tabular listings of orthologous protein sets are provided
as well as a graphical gene synteny plot of shared orthologs
between any two genomes (
Figure 3). Ortholog determination is
based on an all versus all BLAST search of all poxvirus proteins
in the PBRC database.
Availability
The PBRC is developed and maintained at the Department of Microbiology,
University of Alabama at Birmingham and the Department of Biochemistry
and Microbiology, University of Victoria. The PBRC is available
at http://www.poxvirus.org.
The PBRC framework has also been used to support development
of similar bioinformatics resources for other virus families.
The Virus Bioinformatics Resource at http://www.virology.ca
contains databases and analytical tools to support research
on coronaviruses, herpesviruses and baculoviruses.
Future plans
In the near future, the PBRC will become part of a more comprehensive
Viral Bioinformatics Resource Center for Biodefense (VBRC) that
will include additional viruses listed as priority pathogens
by the NIAID. In addition to expansion of the database and available
analytical tools, the VBRC will make available more comprehensive
help screens and tutorials that will provide users with step-by-step
guides in how to use the site to answer typical questions that
might be of interest to the laboratory researcher. The VBRC
is being established as one of the NIAID's ‘Bioinformatics
Resource Centers for Biodefense and Emerging or Re-Emerging
Infectious Diseases’. For more details on these centers,
see http://www.niaid.nih.gov/dmid/genomes/brc/default.htm. The
VBRC is available at http://www.biovirus.org.
Miscellaneous
ACKNOWLEDGEMENTS
This work was funded through two grants from NIAID/DARPA: U01 AI48706 (E.J.L.) and U01 AI48653-02 (R.M.L.B. and C.U.) and Canadian NSERC grant OPG0155125-01 to C.U.
Notes
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact [email protected]
References
- Henderson,D.A., Inglesby,T.V., Bartlett,J.G.,
Ascher,M.S., Eitzen,E., Jahrling,P.B., Hauer,J., Layton,M., McDade,J.,
Osterholm,M.T. et al. ( (1999) ) Smallpox as a biological weapon: medical and public health management. Working Group on Civilian Biodefense. J. Am. Med. Assoc., , 281, , 2127–2137.
- Guarner,J., Johnson,B.J., Paddock,C.D.,
Shieh,W.J., Goldsmith,C.S., Reynolds,M.G., Damon,I.K., Regnery,R.L. and
Zaki,S.R. ( (2004) ) Monkeypox transmission and pathogenesis in prairie
dogs. Emerg. Infect. Dis., , 10, , 426–431.
- Enserink,M. ( (2003) ) Infectious diseases. U.S. monkeypox outbreak traced to Wisconsin pet dealer. Science, , 300, , 1639.
.
- Upton,C., Slack,S., Hunter,A.W., Ehlers,A. and
Roper,R.L. ( (2003) ) Poxvirus Orthologous Clusters (POCs): toward
defining the minimum essential poxvirus genome. J. Virol., , 77, , 7590–7600.
- Ehlers,A., Osborne,J., Slack,S., Roper,R.L. and Upton,C. ( (2002) ) Poxvirus Orthologous Clusters (POCs). Bioinformatics, , 18, , 1544–1545.
- Upton,C., Hogg,D., Perrin,D., Boone,M. and
Harris,N.L. ( (2000) ) Viral genome organizer: a system for analyzing
complete viral genomes. Virus Res., , 70, , 55–64.
- Hiscock,D. and Upton,C. ( (2000) ) Viral Genome
Database: storing and analyzing genes and proteins from complete viral
genomes. Bioinformatics, , 16, , 484–485.
- Brodie,R., Roper,R.L. and Upton,C. ( (2004) ) JDotter: a Java interface to multiple dotplots generated by Dotter. Bioinformatics, , 20, , 279–281.
- Brodie,R., Smith,A.J., Roper,R.L.,
Tcherepanov,V. and Upton,C. ( (2004) ) Base-By-Base: single
nucleotide-level analysis of whole viral genome alignments. BMC Bioinformatics, , 5, , 96.
- Hoskins,J., Alborn,W.E.,Jr, Arnold,J., Blaszczak,L.C., Burgett,S., DeHoff,B.S., Estrem,S.T., Fritz,L., Fu,D.J., Fuller,W. et al. ( (2001) ) Genome of the bacterium Streptococcus pneumoniae strain R6. J. Bacteriol., , 183, , 5709–5717.
- Glass,J.I., Lefkowitz,E.J., Glass,J.S.,
Heiner,C.R., Chen,E.Y. and Cassell,G.H. ( (2000) ) The complete
sequence of the mucosal pathogen Ureaplasma urealyticum. Nature, , 407, , 757–762.
- Chen,N., Danila,M.I., Feng,Z., Buller,R.M.L., Wang,C., Han,X., Lefkowitz,E. and Upton,C. ( (2003) ) The genomic sequence of Ectromelia virus, the causative agent of mousepox. Virology, , 317, , 165–186.
- Wheeler,D.L., Church,D.M., Federhen,S.,
Lash,A.E., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M.,
Sequeira,E., Tatusova,T.A. et al. ( (2003) ) Database resources of the National Center for Biotechnology. Nucleic Acids Res., , 31, , 28–33.
- Ball,L.A. and Mayo,M.A. ( (2004) ) Virology Division News: report from the 33rd Meeting of the ICTV Executive Committee. Arch. Virol., , 149, , 1259–1263.
- Fauquet,C.M. and Mayo,M.A. ( (2001) ) The 7th ICTV report. Arch. Virol., , 146, , 189–194.
- Harris,M.A., Clark,J., Ireland,A., Lomax,J., Ashburner,M., Foulger,R., Eilbeck,K., Lewis,S., Marshall,B., Mungall,C. et al. ( (2004) ) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., , 32, , D258–D261.
- Altschul,S.F., Madden,T.L., Schaffer,A.A.,
Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. ( (1997) ) Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., , 25, , 3389–3402.
Figures
|
Figure 1
PBRC genomic sequence web pages. Screen shots of the PBRC genome list, and the genome map for Variola major virus strain Bangladesh.
(Click image to enlarge)
|
|
Figure 2
PBRC gene search and gene record web pages. Screen shots of the PBRC
gene search form, a listing of gene search results and an individual
gene record.
(Click image to enlarge)
|
|
Figure 3
PBRC BLAST search and gene synteny analytical tool web pages. Screen
shots of XS-BLAST tabular and graphical BLAST search results and a gene
synteny plot.
(Click image to enlarge)
|
http://www.biology-online.org/articles/poxvirus-bioinformatics-resource-center-comprehensive.html