Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource

Abstract

Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource

Elliot J. Lefkowitz*, Chris Upton1, Shankar S. Changayil, Charles Buck2, Paula Traktman3 and R. Mark L. Buller4

Department of Microbiology, University of Alabama at Birmingham, BBRB 276/11; 1530 3rd Avenue S., Birmingham, AL 35294-2170, USA, 1 Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada V8W 2Y2, 2 Virology Collection, ATCC, Manassas, VA 20108, USA, 3 Department of Microbiology and Molecular Genetics, Medical College of Wisconsin, Room 273–BSB, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA and 4 Department of Molecular Microbiology and Immunology, St Louis University Health Sciences Center, 1402 South Grand Boulevard, St Louis, MO 63104, USA

Abstract

The Poxvirus Bioinformatics Resource Center (PBRC) has beenestablished to provide informational and analytical resourcesto the scientific community to aid research directed at providinga better understanding of the Poxviridae family of viruses.The PBRC was specifically established as the result of the concernthat variola virus, the causative agent of smallpox, as wellas related viruses, might be utilized as biological weapons.In addition, the PBRC supports research on poxviruses that mightbe considered new and emerging infectious agents such as monkeypoxvirus. The PBRC consists of a relational database and web applicationthat supports the data storage, annotation, analysis and informationexchange goals of the project. The current release consistsof over 35 complete genomic sequences of various genera, speciesand strains of viruses from the Poxviridae family. Sequenceand annotation information for these viruses has been obtainedfrom sequences publicly available from GenBank as well as sequencesnot yet deposited in GenBank that have been obtained from ongoingsequencing projects. In addition to sequence data, the PBRCprovides comprehensive annotation and curation of virus genes;analytical tools to aid in the understanding of the availablesequence data, including tools for the comparative analysisof different virus isolates; and visualization tools to helpbetter display the results of various analyses. The PBRC representsthe initial development of what will become a more comprehensiveViral Bioinformatics Resource Center for Biodefense that willbe one of the National Institute of Allergy and Infectious Diseases'‘Bioinformatics Resource Centers for Biodefense and Emergingor Re-Emerging Infectious Diseases’. The PBRC websiteis available at http://www.poxvirus.org.


Introduction

An effective response to the use of biological organisms asagents of terrorism or warfare, or to the emergence of new infectiousdiseases requires a multi-disciplinary effort involving variousagencies at the local, state and federal levels including publichealth officials, hospital personnel, epidemiologists and themilitary. In addition to the public health response, a concertedresearch effort is necessary to better detect, understand andrespond to these threats. Such research requires developmentof environmental detectors and clinical diagnostic aids to provideus with rapid warning in the event of an outbreak as well asdevelopment of vaccines to prevent infection and antiviral orantibacterial drugs to cure infection. These efforts requirea comprehensive biological understanding of potential threatagents, including their molecular biology, genetics, pathogenicity,epidemiology and evolution. The National Institute of Allergyand Infectious Diseases as well as the US Centers for DiseaseControl and Prevention maintain a list of priority pathogensthat are considered potential bio-threat agents and/or are microbesthat appear to be new or reemerging pathogens (http://www.bt.cdc.gov/agent/agentlist-category.aspand http://www.niaid.nih.gov/biodefense/bandc_priority.htm).Variola virus, the causative agent of smallpox and a memberof the Poxviridae family of viruses, has perhaps the greatestpotential for use as a bio-weapon and is one of the CategoryA pathogens on these priority pathogen lists (1). In addition,monkeypox virus, a member of the orthopoxvirus genus that includesvariola virus, has caused a number of disease outbreaks in recentyears, including outbreaks in North America resulting from theimportation of rodents from Africa intended to be sold as pets(2,3). The use of high-throughput DNA sequencing techniques as wellas other large-scale ‘Systems Biology’ technologieshave led to an unprecedented increase in the amount of availabledata. Therefore, one overarching necessity in research effortsdirected at providing a better understanding of priority pathogensis the need to collect, manage, describe, analyze and publicizethe vast amounts of information generated by modern, high-throughputbiological research. Therefore, the goal of the Poxvirus BioinformaticsResource Center (PBRC) is to organize all available informationon virus genetics thus aiding research efforts towards increasingour knowledge of virus replication and virus–host interactionon a gene-by-gene and whole genome basis. In addition, the PBRCis expanding on available knowledge by developing and utilizinganalysis tools that can further probe the information containedin the genome and gene sequences of these organisms. Since ourgoal is to establish an information resource to support researchefforts by the scientific community, we are also solicitinginput from that community to ensure the completeness and, aboveall, the accuracy of the information being provided and to ensurethat the software tools provided and in development reflectthe needs of the different research groups using these resources.

What is the poxvirus bioinformatics resource center?

The PBRC represents a cooperative endeavor with collaborationsbetween the University of Alabama at Birmingham, the Universityof Victoria, the Medical College of Wisconsin, St Louis Universityand the American Type Culture Collection (ATCC). The PBRC consortiumhas established a database of all available completely sequencedpoxvirus genomes. This database includes information on everypredicted gene that may be coded for by these genomes, as wellas descriptive annotations of the physical and functional propertiesof each gene based on computer predictions. Currently, onlycomplete genomes are included in the database. In future releaseswe plan to include as available, incomplete genomes as wellas coding sequences from virus strains not represented by completegenomic sequences. We are also compiling a comprehensive gene-by-genecuration that is linked to each individual gene record. In addition,the PBRC provides a variety of analytical information and analysistools that can be used to mine the genomic data. These toolsinclude sequence homology searches, a database of functionaldomains, a database of poxvirus gene orthologs and web-basedvisualization tools to allow for customized displays of muchof this information. A major goal has been to develop new softwarepackages for the analysis of viral genomes. This aspect of theproject involves designing new software tools that permit researchersto interact with and manipulate complete poxvirus genomes andfamilies of poxvirus protein orthologs. This vastly speeds upthe analysis process and provides information about these virusesfrom comparative analyses that otherwise would be almost impossibleto obtain. Some of these tools have been described previouslyin the literature and therefore more substantial descriptionsare already available [Poxvirus Orthologous Clusters (4,5),Viral Genome Organizer (6), Viral Genome Database (7), JDotter(8) and Base-By-Base (9)].

Database description

The basic PBRC web portal accesses genomic, annotative and analyticalinformation from a Microsoft SQL Server database. A companionMySQL database provides data to a number of java-based analyticaltools. The database schema used for the PBRC originally wasdeveloped to accommodate bacterial genomes and their genes (proteinand RNA) (10,11) and needed only slight modification to supportthe storage of information on poxvirus genomes and genes (12).The PBRC data schema provides tables to store sequences andbasic sequence annotations, human-annotated curation recordsand tables for the storage of basic analytical information suchas the results of BLAST searches, functional motifs and biophysicalproperties. The database supports all of the web-based querytools and also serves as the data source for all of our web-basedanalytical tools.

Genome data acquisition

Genomic data are obtained from GenBank (13), other publiclyavailable databases and websites, as well as from ongoing sequencingprojects. The current PBRC release consists of over 35 completegenomes from the Poxviridae family of viruses. If availablefrom the GenBank record or from the sequencing laboratory, thepredicted gene set of any one genome is stored in the PBRC databaseand used as the initial starting point for annotation and curationof each genome. If a predicted gene set is not available, weperform our own gene prediction using a combination of genepredictive tools that start with open reading frames, and theninclude among others, promoter prediction, presence of functionalmotifs and assignment to orthologous protein clusters. Our experienceis that the methods used for the prediction of protein codinggenes for each poxvirus genome present in GenBank has in manycases been derived using different parameters. Therefore, weare in the process of developing a more consistent method forthe prediction of poxvirus genes and will utilize this geneprediction pipeline to reassess the gene set for each availablegenome. Each virus genome is categorized according to its genus, speciesand strain designation as determined by the International Committeefor the Taxonomy of Viruses (ICTV) (http://www.ncbi.nlm.nih.gov/ICTVdb/)(14,15). In addition to providing the nomenclature availablefrom the GenBank record for each gene in any one genome, wealso provide our own gene designation which is a combinationof the ICTV-approved species name, the strain or isolate nameand a numerical designation for each gene that starts at number1 for the left-most gene, and is incremented by one for eachsubsequent gene as determined by its genome position.

Genome annotation and curation

For each gene in the PBRC database, we provide an automated,computer-driven annotation that gathers as much basic descriptiveinformation as possible about a gene, basic analysis of itsnucleotide or amino acid sequence, and the results of sequencesimilarity searches to look for common patterns or featuresthat might be characteristic of its function. The annotationprocess starts with the GenBank record and includes the descriptiveinformation, literature references and any other informationprovided in that record. This information populates the initialdescriptive fields of our database. Following this automatedannotation process, a manual, human-directed curation of eachgene record is undertaken. During this curation process, a researcherreviews the annotation record, all available literature referencesand any unpublished information as available. This collectionof empirically derived properties for the protein in questionprovides what might be considered a mini-review of the biologyof the gene being studied. The broad types of information thatare provided during the curation process include protein propertiessuch as molecular weight and pI; post-translational processing;the availability of custom reagents such as clones, antibodiesand mutants; functional descriptions [including Gene Ontologydesignations (16)]; and literature summaries. Evidence codesare provided that explicitly state the nature and source ofeach piece of information along with the appropriate literaturereferences. A series of web forms assist in this process thatprovides a distinct set of informational fields to be filledin, and enforces use of a controlled vocabulary to fully describeeach gene. The results of the curation process are stored inour SQL Server database and form a Poxvirus Knowledge Databasethat is available and searchable from the PBRC website.

Website

The PBRC website is provided by a Microsoft Windows 2003 serverrunning Microsoft Internet Information Services (IIS). The userinterface is provided through a combination of Active ServerPages running server-side Visual Basic script, client-side JavaScriptand HTML. SQL Server database access is provided through MicrosoftActiveX Data Objects. The starting point for access to all databases and tools availablefrom the PBRC is the PBRC home page at http://www.poxvirus.org/.A menu is displayed along the top of the page and appropriateclicking on a menu link brings up a submenu that provides accessto individual web query tools and/or applications. The Datamenu provides access to search forms that provide user-specifiedqueries for genome, gene and sequence data (Figures 1 and 2).Analytical tools are provided through another series of submenusand provide access to a series of analytical and visualizationtools (Figure 3). In addition to the main menu and context-sensitivesubmenu, a menu near the bottom of the page provides accessto PBRC organizational information such as descriptions of thepeople and teams involved in the work, acknowledgements anda form to provide user feedback. At the very bottom of the formwe provide a Google-supported search form.

In general, available web pages can be categorized as follows:

  1. Informational: Static text such as help files, biographicalinformation or meeting announcements.
  2. Forms: Pages that provideuser input for data queries.
  3. Tabular results: Pages displayingdata in tables.
  4. Graphical results: Pages displaying data resultsas figures.
  5. Links: Pages providing access to other pages.
  6. Applets: Small Java-based applications for data analysis andvisualization that are available from within a user's web browser.
  7. Applications: More comprehensive Java-based analytical applicationsthat can be run from a web page using Java Web Start, or downloadedto a user's workstation and run as a stand-alone application.

 


Analytical tools

BLAST similarity searches (17) are available that provide standardweb-based BLAST output as well as a web-based version of ourBLAST parsing and visualization application, XS-BLAST (XML-SQLBLAST). XS-BLAST utilizes the XML-output option of the NCBIBLAST executable, and parses the results storing them in ourSQL Server database. The result set is then provided to theuser as an HTML table, as well as from a Java-based graphicalvisualization tool (Figure 3). Both BLAST searches are run locallyon a Sun Solaris server and provide several customized databasesfor searching. These include all complete genomic Poxviridaenucleic acid sequences, all predicted Poxviridae protein sequencesand all available Poxviridae nucleotide sequences extractedfrom GenBank. Comparative analysis of the coding potential of all Poxviridaegenomes is available as both pairwise and multi-genome comparisons.Both tabular listings of orthologous protein sets are providedas well as a graphical gene synteny plot of shared orthologsbetween any two genomes (Figure 3). Ortholog determination isbased on an all versus all BLAST search of all poxvirus proteinsin the PBRC database.

Availability

The PBRC is developed and maintained at the Department of Microbiology,University of Alabama at Birmingham and the Department of Biochemistryand Microbiology, University of Victoria. The PBRC is availableat http://www.poxvirus.org. The PBRC framework has also been used to support developmentof similar bioinformatics resources for other virus families.The Virus Bioinformatics Resource at http://www.virology.cacontains databases and analytical tools to support researchon coronaviruses, herpesviruses and baculoviruses.

Future plans

In the near future, the PBRC will become part of a more comprehensiveViral Bioinformatics Resource Center for Biodefense (VBRC) thatwill include additional viruses listed as priority pathogensby the NIAID. In addition to expansion of the database and availableanalytical tools, the VBRC will make available more comprehensivehelp screens and tutorials that will provide users with step-by-stepguides in how to use the site to answer typical questions thatmight be of interest to the laboratory researcher. The VBRCis being established as one of the NIAID's ‘BioinformaticsResource Centers for Biodefense and Emerging or Re-EmergingInfectious Diseases’. For more details on these centers,see http://www.niaid.nih.gov/dmid/genomes/brc/default.htm. TheVBRC is available at http://www.biovirus.org.

Miscellaneous

ACKNOWLEDGEMENTS

This work was funded through two grants from NIAID/DARPA: U01 AI48706 (E.J.L.) and U01 AI48653-02 (R.M.L.B. and C.U.) and Canadian NSERC grant OPG0155125-01 to C.U.


Notes

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions@oupjournals.org.


References

  1. Henderson,D.A., Inglesby,T.V., Bartlett,J.G., Ascher,M.S., Eitzen,E., Jahrling,P.B., Hauer,J., Layton,M., McDade,J., Osterholm,M.T. et al. ( (1999) ) Smallpox as a biological weapon: medical and public health management. Working Group on Civilian Biodefense. J. Am. Med. Assoc., , 281, , 2127–2137.

  2. Guarner,J., Johnson,B.J., Paddock,C.D., Shieh,W.J., Goldsmith,C.S., Reynolds,M.G., Damon,I.K., Regnery,R.L. and Zaki,S.R. ( (2004) ) Monkeypox transmission and pathogenesis in prairie dogs. Emerg. Infect. Dis., , 10, , 426–431.

  3. Enserink,M. ( (2003) ) Infectious diseases. U.S. monkeypox outbreak traced to Wisconsin pet dealer. Science, , 300, , 1639. .

  4. Upton,C., Slack,S., Hunter,A.W., Ehlers,A. and Roper,R.L. ( (2003) ) Poxvirus Orthologous Clusters (POCs): toward defining the minimum essential poxvirus genome. J. Virol., , 77, , 7590–7600.

  5. Ehlers,A., Osborne,J., Slack,S., Roper,R.L. and Upton,C. ( (2002) ) Poxvirus Orthologous Clusters (POCs). Bioinformatics, , 18, , 1544–1545.

  6. Upton,C., Hogg,D., Perrin,D., Boone,M. and Harris,N.L. ( (2000) ) Viral genome organizer: a system for analyzing complete viral genomes. Virus Res., , 70, , 55–64.

  7. Hiscock,D. and Upton,C. ( (2000) ) Viral Genome Database: storing and analyzing genes and proteins from complete viral genomes. Bioinformatics, , 16, , 484–485.

  8. Brodie,R., Roper,R.L. and Upton,C. ( (2004) ) JDotter: a Java interface to multiple dotplots generated by Dotter. Bioinformatics, , 20, , 279–281.

  9. Brodie,R., Smith,A.J., Roper,R.L., Tcherepanov,V. and Upton,C. ( (2004) ) Base-By-Base: single nucleotide-level analysis of whole viral genome alignments. BMC Bioinformatics, , 5, , 96.

  10. Hoskins,J., Alborn,W.E.,Jr, Arnold,J., Blaszczak,L.C., Burgett,S., DeHoff,B.S., Estrem,S.T., Fritz,L., Fu,D.J., Fuller,W. et al. ( (2001) ) Genome of the bacterium Streptococcus pneumoniae strain R6. J. Bacteriol., , 183, , 5709–5717.

  11. Glass,J.I., Lefkowitz,E.J., Glass,J.S., Heiner,C.R., Chen,E.Y. and Cassell,G.H. ( (2000) ) The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature, , 407, , 757–762.

  12. Chen,N., Danila,M.I., Feng,Z., Buller,R.M.L., Wang,C., Han,X., Lefkowitz,E. and Upton,C. ( (2003) ) The genomic sequence of Ectromelia virus, the causative agent of mousepox. Virology, , 317, , 165–186.

  13. Wheeler,D.L., Church,D.M., Federhen,S., Lash,A.E., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Sequeira,E., Tatusova,T.A. et al. ( (2003) ) Database resources of the National Center for Biotechnology. Nucleic Acids Res., , 31, , 28–33.

  14. Ball,L.A. and Mayo,M.A. ( (2004) ) Virology Division News: report from the 33rd Meeting of the ICTV Executive Committee. Arch. Virol., , 149, , 1259–1263.

  15. Fauquet,C.M. and Mayo,M.A. ( (2001) ) The 7th ICTV report. Arch. Virol., , 146, , 189–194.

  16. Harris,M.A., Clark,J., Ireland,A., Lomax,J., Ashburner,M., Foulger,R., Eilbeck,K., Lewis,S., Marshall,B., Mungall,C. et al. ( (2004) ) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., , 32, , D258–D261.

  17. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. ( (1997) ) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., , 25, , 3389–3402.

Figures

mcith_pbrcf01.JPG Figure 1 PBRC genomic sequence web pages. Screen shots of the PBRC genome list, and the genome map for Variola major virus strain Bangladesh.

(Click image to enlarge)

mcith_pbrcf02.JPG Figure 2 PBRC gene search and gene record web pages. Screen shots of the PBRC gene search form, a listing of gene search results and an individual gene record.

(Click image to enlarge)

mcith_pbrcf03.JPG Figure 3 PBRC BLAST search and gene synteny analytical tool web pages. Screen shots of XS-BLAST tabular and graphical BLAST search results and a gene synteny plot.

(Click image to enlarge)

 


http://www.biology-online.org/articles/poxvirus-bioinformatics-resource-center-comprehensive.html