A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data
Lennart J. G. Post 1,2, Marco Roos 1, M. Scott Marshall 1, Roel van Driel 2
and Timo M. Breit 1,*
1Integrative Bioinformatics Unit and 2Nuclear Organization Group, Swammerdam Institute for Life Sciences, University of Amsterdam, 1098 SM, Amsterdam, The Netherlands
*To whom correspondence should be addressed.
Motivation: The numerous public data resources make integrative bioinformatics experimentation increasingly important in life sciences research. However, it is severely hampered by the way the data and information are made available. The semantic web approach enhances data exchange and integration by providing standardized formats such as RDF, RDF Schema (RDFS) and OWL, to achieve a formalized computational environment. Our semantic web-enabled data integration (SWEDI) approach aims to formalize biological domains by capturing the knowledge in semantic models using ontologies as controlled vocabularies. The strategy is to build a collection of relatively small but specific knowledge and data models, which together form a ‘personal semantic framework’. This can be linked to external large, general knowledge and data models. In this way, the involved scientists are familiar with the concepts and associated relationships in their models and can create semantic queries using their own terms. We studied the applicability of our SWEDI approach in the context of a biological use case by integrating genomics data sets for histone modification and transcription factor binding sites.
Results: We constructed four OWL knowledge models, two RDFS data models, transformed and mapped relevant data to the data models, linked the data models to knowledge models using linkage statements, and ran semantic queries. Our biological use case demonstrates the relevance of these kinds of integrative bioinformatics experiments. Our findings show high startup costs for the SWEDI approach, but straightforward extension with similar data.
Availability: Software, models and data sets, http://www.integrativebioinformatics.nl/swedi/index.html
Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics 2007 23(22):3080-3087. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/).