Systematic whole genome DNA methylation studies are just beginning. The comprehensive survey of allelic CGI methylation across human chromosome 21q in peripheral blood indicated that up to 20% of the CGIs were methylated and also detected two novel imprinted loci as well a non-imprinted gene with monoalleleic methylation of its CGI (27). Large scale structure of genomic methylation profiles in the brain is currently being mapped using similar enzyme based methods for the fractionation of methylated and unmethylated domains of the genome (T. Bestor, personal communication).
Apart from the already mentioned chromatin fibre map (36) and the DNA replication maps, histone modifation maps are beginning to emerge, including a high resolution genomic ChiP-chip analyses of H3 Lys4 methylation and H3 Lys9/14 acetylation for human chromosomes 21 and 22 in a human hepatoma cell line (59). The same work included comparative human and mouse primary fibroblast maps for Lys4 methylation at selected loci (59). Smaller scale epigenome projects such as the intergrated profiling of gene expression and chromatin modifications (histone modifications and DNA methylation) in Drosophila (43) and Arabidopsis (60) have been undertaken using the previously described technology platforms. The latter authors show that heterochromatin in Arabidopsis is determined by transposable elements and related tandem repeats, under the control of the chromatin remodelling protein DDM1 (60,61).
Human epigenome project (HEP)
The HEP aims to systematically analyse DNA methylation in the regulatory regions of all known genes in most major cell types and their diseased variants along with high-density snapshots of non-genic regions spread evenly across the human genome. Methylation variable positions (MVPs) are thought to reflect gene activity, tissue type and disease state and are useful epigenetic markers revealing the dynamic state of the genome. MVPs are defined as CpG sites with statistical power to discriminate between different biological samples and/or states. Akin to single nucleotide polymorphisms (SNPs), MVPs will greatly advance our ability to elucidate and diagnose the molecular basis of human diseases.
As a pilot study, DNA methylation profiling was carried out on the human major histocompatibility complex (MHC), one of the most gene-dense regions in the human genome, containing genes with a high diversity of function located on chromosome 6 (61). For the pilot study, an integrated pipeline for high-throughput methylation analysis using bisulphite DNA sequencing, MVP discovery, epigenotyping by MALDI-MS and a public database were developed. DNA methylation levels within regulatory, exonic and intronic regions associated with 90 genes (i.e. >70% of all expressed genes within the MHC) were analysed in seven human tissues—adipose, brain, breast, liver, lung, muscle and prostate—with multiple samples from different individuals. For the DNA methylation profiling of the human MHC, regions with potential regulatory functions and CpGs dense regions of a gene were selected for sequencing in addition to CGIs. This selection was dependent on annotated sequence data and, whereas CGIs and CpG dense regions within genes were easily identifiable, the precise locations of all the promoters within the human MHC were unknown at the time this study was initiated. Therefore, sequences 2 kb upstream of all annotated start codons were also selected for study to ensure that promoters and upstream regulatory regions were included. Although the MHC is one of the most thoroughly annotated regions of the genome, the annotation is still ongoing, and we foresee that more sequences will be added to the HEP map as antisense genes and regulatory RNAs are identified.
The overall methylation profile of the MHC was shown to be bimodal with >90% of the regions tested being either hyper- or hypomethylated; however, heterogeneity at individual CpG sites was frequently observed. These results are similar to bimodal genomic methylation profiles observed previously by several authors (reviewed in 62) and confirm the results of others who have shown heterogeneous methylation profiles of individual genes in vivo (63,64). 80% of the CpGs analysed in the HEP pilot displayed methylation levels that varied by >20%, either between individuals and/or tissues. Upstream regions (5'-UTR, and promoter regions) of genes analysed were more likely to be hypomethylated compared with intragenic regions, and introns were less likely to be methylated than exons. Comparisons of DNA methylation with expression levels for MHC genes in several tissues indicated that hypermethylation of upstream regions was associated with gene silencing. A web-based ENSEMBL-like genome browser has been created for displaying HEP data which are publicly available (Fig. 2). Following further scale-up, methylation profiling of all known genes (around 3000) on chromosomes 6, 13, 20 and 22 are now underway.
Applications of the epigenome
The future potential of the epigenome is wide-ranging. In addition to advancing basic research, the epigenome has immediate applications for diagnostics, and as epigenetic alterations are potentially reversible, it has potential applications for therapeutics as well. As a resource, the HEP will provide the normal baseline level of DNA methylation as a reference for subsequent profiling in the context of cancer and complex disease. Methylation profiling technologies promise to enable the characterization of distinct methylation signatures for complex diseases and various cancers with diagnostic implications. DNA methylation is now considered a potential biomarker in cancer (65,66). Cancers can be classified according to their degree of methylation, and those cancers with high degrees of methylation (the CpG island methylator phenotype) represent a clinically and aetiologically distinct group that is characterized by ‘epigenetic instability’ (reviewed in 66). Epigenetic therapies propose using global DNA methylation inhibitors to reverse gene silencing caused by altered methylation (67).
The HEP is essentially embarking on practical epigenotyping by identifying and classifying epigenetic marks that are transmitted vertically, for example, inter-individual variants and tissue-specific variants. In the first instance, we can ask whether epigenetic variation is less between monozygotic twins than siblings. Variation in gene expression between alleles is not restricted to regulation by genomic imprinting or X-inactivation. Indeed, allele expression variation is relatively common in humans and differentially expressed genes are distributed throughout the genome (68,69). Moreover, some alleles, known as epialleles, have variable expressivity in the absence of genetic heterogeneity because of their epigenetic states (reviewed in 70). The mechanisms responsible for variable gene expression can now be unravelled by relating epigenotype variation to genotype variation and haplotypes in normal individuals. In complex diseases, the frequency and disease onset time may be influenced by epigenetic variants and age-dependent epigenetic changes (71). It is conceivable that variation in methylation status of a gene could be affected by genotype either directly, where genetic variation could introduce or remove CpG sites which are susceptible to methylation or indirectly, by introducing sequences (e.g. repeat elements) that affect methylation in cis. Loss of imprinting of the IGF2 gene is present in 10% individuals who showed no sign of the imprinted growth disorder, that is, Beckwith–Wiedeman syndrome (BWS). In BWS patients, specific haplotypes within the IGF2 gene have been associated with loss of methylation at the locus (72), suggesting that epigenetic and genetic variation may act synergistically to influence a phenotype. For this, we suggest the introduction of ‘hepitype’ which combines haplotype and epitype information and allows dissecting out subtle epigenetic contributions to a given phenotype. The basic concept of hepitypes is illustrated in Figure 3.