table of contents table of contents

Complete genome sequences of important bacterial pathogens and industrial organisms hold significant …

Home » Biology Articles » Genetics » Genomics » A Flood of Microbial Genomes–Do We Need More? » Making sense of the genome piles

Making sense of the genome piles
- A Flood of Microbial Genomes–Do We Need More?

Developing the computational infrastructure necessary to support data analysis and formulation of tools and resources is necessary to fully utilize the wealth of genomic information. Novel data integration capabilities in a community genomics environment are likely to give rise to cutting-edge platforms. However, availability of processed data to feed into such platforms will depend on the speed and accuracy with which the genomic raw data and assemblies are processed. It is noteworthy to mention the success of subsystems approaches wherein annotation servers have been developed that are capable of processing 20–50 prokaryotic genomes daily. Such tools as the RAST server [26] can annotate up to 200–300 genomes per month. This machine identifies RNA-encoding and protein-encoding genes, assigns functions to the genes, and attempts to place the genes within genomic subsystems, producing an initial estimate of which subsystems (i.e., pathways, complexes, and non-metabolic components of the cell) are present in the genome. The accuracy of the annotations arises from manual curation of a library of over 800 subsystems that include over 1.5 million genes with functions assigned from a controlled vocabulary.

Processed genomic information as above is likely to make up excellent inputs for the systems that exploit the power of collaborative grid computing aimed at integration of information that links organisms through their genes and gene products via a semantic web approach [27]. Bacterial genome experts, microbiologists, evolutionists and clinical research specialists are likely to benefit from tools that could quickly identify and explore genome encoded features that help decipher particular lifestyles, survival advantages, core metabolic pathways, plastic zones, diagnostic markers and drug targets. This of course needs processing and comparisons of multiple datasets in an in silico or a ‘virtual’ laboratory [28]. The complexity of such projects however, requires an e-Science approach wherein a computational environment enables transparent and seamless access to distributed datasets, through scientific workflows that automate in silico experimentation across grids of international networks [27]. One such revolutionary resource which integrates different forms of federated information comprising of genomic sequences and associated metadata relating to various marine microbial sequencing projects is CAMERA (Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis) [29]. This is a highly robust community approach to support a fundamental paradigm shift in the way microbial genomic datasets are analyzed and interpreted. One of the future challenges of such platforms that are focused on the genomic datasets is how they can be integrated with information from functional analyses of transcriptomes, regulomes, proteomes, interactomes and metabolomes of organisms in a dynamic and interactive fashion.

Academic institutions and publicly funded research consortia are generating large sets of ‘omic’ data that are capable of serving collaborative groups across different disciplines. With the cutting-edge approaches as discussed above, it will be possible to facilitate these groups to bring in and compare their sets of data with other experimental results and pathways available in the public domain. Web tools based on these concepts (for example, nextBio; routinely extract, integrate and compare information and observations contained in publications while juxtaposing colossal amounts of disparate, biological or clinical and ‘omic’ data from public and proprietary sources, regardless of data type and origin. Other tools such as Ondex [30] display biological data as a set of linked graphs with the nodes representing a data object and the edges representing a relationship between the two nodes (

rating: 5.00 from 1 votes | updated on: 30 Jun 2009 | views: 6035 |

Rate article: