The discovery of the Ten‐Eleven‐Translocation (TET) oxygenases that catalyze the hydroxylation of 5‐methylcytosine (5mC) to 5‐hydroxymethylcytosine (5hmC) has triggered an avalanche of studies aiming to resolve the role of 5hmC in gene regulation if any. Hitherto, TET1 is reported to bind to CpG‐island (CGI) and bivalent promoters in mouse embryonic stem cells, whereas binding at DNAseI hypersensitive sites (HS) had escaped previous analysis. Significant enrichment/accumulation of 5hmC but not 5mC can indeed be detected at bivalent promoters and at DNaseI‐HS. Surprisingly, however, 5hmC is not detected or present at very low levels at CGI promoters notwithstanding the presence of TET1. Our meta‐analysis of DNA methylation profiling points to potential issues with regard to the various methodologies that are part of the toolbox used to detect 5mC and 5hmC. Discrepancies between published studies and technical limitations prevent an unambiguous assignment of 5hmC as a ‘true’ epigenetic mark, that is, read and interpreted by other factors and/or as a transiently accumulating intermediary product of the conversion of 5mC to unmodified cytosines.
In nearly half a century of research and in tens of thousands of publications, the role of the 5‐methylcytosine (5mC) DNA modification has been studied in development, differentiation, imprinting, X‐inactivation, gene regulation and disease, and still it possesses many riddles. It is often named the fifth base to illustrate its importance and heritability.
The discovery that 5‐hydroxymethylcytosine (5hmC) constitutes a small, but significant fraction of cytosines in the DNA derived from Purkinje neurons (Kriaucionis and Heintz, 2009) and the identification of the enzyme converting 5mC to 5hmC—the Ten‐Eleven‐Translocation (TET) oxidase—(Tahiliani et al, 2009) questions whether methylation of cytosines is the only, or at least the important DNA modification in mammalian genomes and sparked studies to unveil the location and biological function of the mark.
A link between 5hmC and endogenous oxidative stress in DNA of mammalian tissues was first described (Cannon et al, 1988; Boorstein et al, 1989; Cannon‐Carlson et al, 1989). Due to technical limitations in analyzing 5hmC, attention faded and the awareness of this modification in mammalian genomes was ‘lost.’ In 1993, base J was identified in the nuclear DNA of Trypanosoma brucei (Gommers‐Ampt et al, 1993a 1993b). The newly identified TET enzymes are related to the Trypanosoma proteins, JBP1 et JBP2, that belong to the 2 oxoglutarate and Fe(II)‐dependent hydroxylases family (Yu et al, 2007; Cliffe et al, 2009). Overexpression of wild‐type and mutant TET1 and RNA interference‐mediated depletion of endogenous TET convincingly showed that TET catalyzes the conversion of 5mC to 5hmC in cultured cells (Tahiliani et al, 2009; Ito et al, 2010; Koh et al, 2011).
In this Perspective, we will focus on recent genome‐wide profiling studies that provide the basis for future functional analysis.
Biological function of TET proteins
The observation that in acute myeloid leukemia (AML), TET1 is an oncofusion partner of the histone H3 Lys4 (H3K4) methyltransferase MLL provided a first link between TET proteins and the epigenome (Ono et al, 2002; Lorsbach et al, 2003). The molecular underpinning of how MLL–TET1 fusion protein contributes to leukemogenesis remained, however, largely unexplored. The identification of TET proteins as oxygenases by the Rao and Heintz laboratories and the discovery that TET2 is frequently mutated in a range of human myeloid malignancies, including myelodysplastic syndromes (Delhommeau et al, 2009; Langemeijer et al, 2009) placed this small family of oxygenases into the limelight. TET2 mutations appear to associate with low 5hmC levels and global hypomethylation (Ko et al, 2010), suggesting that an altered 5hmC status leads to deregulation of important hematopoietic regulators and contributes to malignancy. Mutations of TET2 and the isocitrate dehydrogenase genes IDH1/IHD2 that catalyze the interconversion of isocitrate to α‐ketoglutarate appears to be mutually exclusive in AML (Figueroa et al, 2010), consistent with the requirement of the TET enzymes for αKG as substrate.
Two studies using conditional knockout of Tet2 provided important insights into the role of TET2 in normal hematopoiesis and malignancies (Moran‐Crusio et al, 2011; Quivoron et al, 2011). TET2 loss resulted in expansion of hematopoietic stem and progenitor cell populations directly contributing to myeloproliferation. This is consistent with a role of TET2 disruption (by deletion or sequence mutation) in the pathogenesis of lymphoid as well as myeloid disorders. Moreover, the mutations in both lineages of malignancy are commonly acquired in early hematopoietic progenitors of multi‐lineage potential, indicating that the enhanced self‐renewal upon TET2 inactivation is an important contributor to transformation. Knockout of Tet1 in embryonic stem cell (ESC) causes a subtle reduction of 5hmC levels in ESC but did not affect pluripotency possibly because of the compensatory action of TET2. Surprisingly, Tet1−/− mice turn out to be viable and fertile (Dawlaty et al, 2011).
In an elegant study, Walter and co‐workers revealed the role of 5hmC in genome‐wide DNA demethylation in zygotic development. This laboratory had previously shown that the paternal genome in the pronucleus rapidly undergoes active DNA demethylation of 5mC and remains demethylated following several rounds of cell division, while the maternal genome remains methylated even though it is exposed to the same cytoplasmic factors (Oswald et al, 2000). In a recent study, they showed that 5mC is converted into 5hmC in the paternal pronucleus by the TET3 dioxygenase (Wossidlo et al, 2011). Furthermore, they confirmed the role played by PGC7/Stella in blocking/inhibition the TET3‐mediated oxidation in the maternal pronucleus (Nakamura et al, 2007; Wossidlo et al, 2011). Inoue and Zhang (2011) further showed that 5hmC of the paternal genome is lost following replication. The stability of the 5hmC mark suggests that it itself may be a functional modification linked to chromatin (re)organization events in early cleavage embryos (Wossidlo et al, 2011). In fact, blocking oxidation of 5mC by TET3 deletion reduced developmental fitness, fetal survival and affected the epigenetic reprogramming of the donor nuclear DNA in somatic cell nuclear transfer (Gu et al, 2011).
Recently, it was found that TET proteins not only have the capacity to oxidize 5mC to 5hmC, but also to 5‐formylcytosine (5fC) and 5‐carboxylcytosine (5caC) in vitro (He et al, 2011; Ito et al, 2011) and could play a role in DNA demethylation implying that 5hmC, 5fC and 5caC may only be intermediates of the removal of methylated cytosines. Xu and his group showed that T DNA glycosylase (TDG) can specifically recognize and excise 5caC (He et al, 2011); this conversion could trigger TDG initiated by base excision repair (BER) glycosylases that would lead to DNA demethylation. Moreover, studies of nuclear reprogramming showed that activation‐induced cytidine deaminase (AID), a member of APOBEC family proteins, mediates deamination of cytosine residues to uracils, which are then repaired by either BER or mismatch repair (Bhutani et al, 2010). Indeed, DNA demethylation was reported in combination with the recruitment of BER enzymes as part of the transcription cycle at target genes such as PS2 (TFF1) in response to ligand activation of the estrogen receptor (Metivier et al, 2008).
TET genome‐wide profiles
To shed light on the role of TET1 in ES cells, chromatin immunoprecipitation followed by deep sequencing (ChIP‐seq) has been used to establish genomic profiles of TET1 (Williams et al, 2011; Xu et al, 2011; Wu et al, 2011b). To rule out possible differences due to mapping algorithms and settings, we remapped the published data from mouse ESC (Williams et al, 2011; Xu et al, 2011; Wu et al, 2011a). TET1 appears to be highly enriched at nearly all CpG‐island (CGI) promoters that are hypomethylated (Figure 1 top panel). It was suggested that TET1 may play a role in maintenance of the hypomethylated state (Williams et al, 2011; Xu et al, 2011). However, knockdown of TET1 only led to a small increase in 5mC levels at TET1‐binding sites. In the absence of direct and compelling evidence, it remains an open question what role TET1 is playing in this promoter context. Different groups reported that TET1 also binds actively transcribed CpG‐poor gene promoters, such as Nanog, Esrrb and Tcl1, whose gene products are pluripotency‐related factors (Ito et al, 2010; Ficz et al, 2011; Wu et al, 2011b). However, these findings have not been confirmed following depletion of TET1 and/or TET2 (Koh et al, 2011; Williams et al, 2011). Importantly, in TET1 knockout cells, pluripotency was not affected and expression of the pluripotency markers Oct4, Nanog and Sox2 was not altered (Dawlaty et al, 2011).
TET1 is also present at the so‐called ‘bivalent’ promoters (Figure 1 middle panel) a chromatin state characterized by the presence of H3K4me3, H3K27me3 and PRC2 (Azuara et al, 2006; Bernstein et al, 2006; Mikkelsen et al, 2007; Ku et al, 2008). A direct interaction between TET1 and PRC2 could not be detected (Williams et al, 2011; Wu et al, 2011b). Interestingly, our analysis of the published datasets shows that TET1‐binding sites nearly perfectly overlap with DNAseI hypersensitive sites (HS) as determined in the ZhBTc4 ESCs (Levasseur et al, 2008) (Figure 1 bottom panel). These results provide further evidence for a role of TET1 in transcriptional regulation.
5hmC state maps
The discovery of 5hmC spurred efforts to detect and profile 5hmC. Methods based on affinity enrichment through methylcytosine‐specific protein domains (Cross et al, 1994) or through antibody‐mediated immunoprecipitation as in MeDIP (methylated‐DNA immunoprecipitation‐sequencing; Weber et al, 2005) that efficiently enrich for 5mC‐containing DNA, do not pull down 5hmC (Huang et al, 2010; Jin et al, 2010; Nestor et al, 2010). Furthermore, sodium bisulfite treatment of DNA (Clark et al, 1994) does not distinguish between 5mC and 5hmC (Huang et al, 2010; Jin et al, 2010; Nestor et al, 2010). In a relatively short time, an entire toolbox has been developed to distinguish 5mC from 5hmC. Many of these methods only detect and quantitate 5hmC levels, while others have coupled their method to massive parallel sequencing to study 5hmC localization on a genomic scale (Table I).
The first genome‐wide 5hmC profile was generated for mouse cerebellum DNA using selective chemical labeling and affinity enrichment involving an azide‐modified glucose transfer to 5hmC by β‐glucosyltransferase (βGT) and click chemistry to couple a biotin derivative used for subsequent enrichment of modified 5hmC (Song et al, 2010). 5hmC appeared to be enriched in gene bodies, regions proximal to the transcription start sites (TSSs) as well as transcription end sites of highly expressed genes suggestive of a role in activation and/or maintenance of gene expression. Several genome‐wide studies used hMeDIP‐seq (hydroxymethylated‐DNA immunoprecipitation‐sequencing), a method adapted from MeDIP‐seq (Weber et al, 2005; Down et al, 2008; Butcher and Beck, 2010) using antibodies raised against 5hmC. Notwithstanding the use of the same antibody source, the conclusions of these studies are not entirely concordant (see below).
In a first comparison, we computed intensity plots of genomic locations that are occupied by TET1 (Figure 2). At bivalent promoters, a very clear enrichment of 5hmC is detected in GLIB (glucosylation, periodate oxidation, biotinylation), an approach involving βGT conversion (Pastor et al, 2011). A good enrichment is also attained by CMS (cytosine 5‐methylenesulfonate) that elegantly makes use of sodium bisulfite conversion of 5hmC to CMS (Hayatsu and Shiragami, 1979) and immunoprecipitation with an antibody raised against CMS (Ko et al, 2010; Pastor et al, 2011). The signal over background is much less pronounced in the various hMeDIP‐seq assays. In all studies, MeDIP‐seq displays a clear depletion of 5mC at bivalent loci (Figure 2).
The intensity plots of CGI promoters reveal discordance between the published datasets. Whereas a very slight enrichment is seen in the Ficz et al data depletion is observed in the other hMeDIP‐seq profiles. Also in GLIB and CMS, 5hmC is depleted over the TSS (Figure 2). The origin of the ‘background’ signal in the CMS control—essentially a bisulfite sequencing profile—is not clear at present.
To gain insight into the differences, we performed a genome‐wide sliding window approach and pairwise R2 correlations. We included three of our own datasets: MeDIP‐, hMeDIP‐ and MethylCap‐seq of E14 ESC (for more details on methods, data processing and analysis see Supplementary information). It should be noted that the same antibody source was used in these MeDIP and hMeDIP studies, respectively. The GLIB and CMS profiles are excluded from the pairwise correlations because of the intrinsic difference in the data structure. The heatmap reveals that the MeDIP‐seq data group together in a distinct cluster whereas the hMeDIP‐seq splits up into two clusters (Figure 3a), suggesting that the hMeDIP approach is not yet technically mature and/or sufficiently standardized. This becomes even more apparent in the clustering of CGI promoters (−500 to +500 bp around the TSS). The heatmap is rather disorganized in hMeDIP profiles and they do not cluster together (Figure 3b); some of the hMeDIP profiles correlate with ChIP's performed with IgG. The clustering is likely driven by background as foreground signals are largely absent (see also Figure 1). In the absence of clear proof that 5hmC is present/elevated at CGI promoters, the model that TET1 clears CGI promoters from 5mC by converting it into 5hmC (Ficz et al, 2011) needs to be taken with great caution.
In the genome‐wide comparison, the profile generated by MethylCap‐seq (Brinkman et al, 2010) though having a different data structure shows a correlation with the MeDIP (boxed in Figure 3). It should be noted that binding of the MBD in MethylCap requires symmetrically methylated CpG (Nan et al, 1993). In immunoprecipitation with 5mC or 5hmC antibodies, asymmetrically methylated DNA can also be pulled down as the immunoprecipitation is performed on denatured, single‐stranded DNA. Indeed, visual inspection of the profiling data in a genome browser showed high enrichment of CA‐repeats (Figure 4a) and CT‐repeats (data not shown) in MeDIP and hMeDIP but not in MethylCap. Genome‐wide analysis revealed a very prominent enrichment for CA‐ and CT‐repeats (Figure 4b). The DNA strands containing the CA‐ or CT‐sequence, but not their complementary strands, are highly preferentially enriched and sequenced. The question arises whether this prominent enrichment is a proof of 5(h)mC at CA‐ and CT‐repeats or whether these regions are precipitated because of cross‐reactivity of the antibodies with unmethylated cytosines. Deep sequencing of bisulfite converted DNA uncovered the presence of asymmetric non‐CpG methylation at least in human ES cells (Ramsahoye et al, 2000; Lister et al, 2009). Because the bisulfite approach does not discriminate between 5mC and 5hmC, the intriguing question is whether asymmetric modification of the 5mC as well as 5hmC type indeed occur in the CA‐ and CT‐repeats in the genome and what is the role of such an asymmetric distribution.
The extent to which the CA‐ and CT‐repeats are enriched in MeDIP and hMeDIP is however puzzling; the average tag density of peaks over CA‐repeats is higher than over bivalent loci (Figure 4c), suggesting that the repeats may be extensively (hydroxyl)methylated in mouse ESC. In striking contrast with the high enrichment in (h)MeDIP, genome‐wide deep sequence analysis of bisulfite converted DNA from human ESCs show that symmetric CpG methylation is much more abundant than non‐symmetric CA or CT modification (Lister et al, 2009; Laurent et al, 2010; Chen et al, 2011). Our re‐analysis of bisulfite deep sequencing data (see also Supplementary information) shows that CA‐ and CT‐repeats are predominantly unmodified in HFS1 ESCs (Chen et al, 2011), whereas in WA09 hESCs (Laurent et al, 2010), modified and unmodified cytosine are present at roughly equal level in the repeats (Figure 5). While the basis of the discrepancy between the two lines is unclear, these BS‐seq data reveal that cytosine modification in repeats are present at least in some human ESCs. Given these large differences, extrapolation of the data obtained with human to mouse ESC is premature and awaits the availability of BS‐seq profiles in mouse ESC lines.
The reasons underlying the surprisingly prominent enrichment in (h)MeDIP remain unclear. It could very well be due to cross‐reactivity of the antibodies with unmethylated cytosine. Dot blot analysis revealed selectivity for the respective modified cytosine; however, these experiments were not performed with synthetic oligonucleotides with high cytosine content or with CA‐ and CT‐repeats. If cross‐reactivity indeed plays a role, one would predict that genomic regions with high local cytosine density such as in CA‐ and CT‐repeats might be efficiently immunoprecipitated. The extend of 5mC and in particular 5hmC is very low as compared with unmethylated cytosine; therefore, immunoprecipitation of the much more abundant unmodified cytosines in regions of high cytosine density could be substantial. The lack of signal in GLIB and CMS at CA‐ and CT‐repeats indicates but does not proof that the signals in the CA‐ and CT‐repeats is at least in part due to cross‐reactivity.
To resolve this ambiguity, parallel BS‐seq and (h)MeDIP analysis need to be performed on the same cell lines/strains, but it is beyond the scope of this Perspective to include such extensive analysis. Definitive proof awaits systematic comparison and benchmarking of the different methods.
Exciting and unexpected new observations appear at a dazzling speed and change our perspectives on the function of cytosine modifications. It is still early times though to propose unifying models and mechanisms. One likely scenario is that the TET enzymes are involved in removal of 5mC and maintenance of the unmethylated state. The conversion of 5mC to 5hmC by TET enzymes would be the first step maybe followed by further conversion into 5fC and 5caC. Several reports suggest that TET enzymes indeed act along with BER pathways to promote active DNA demethylation. The knockout of TDG revealed its active role in the demethylation process (Cortellino et al, 2011) and a very recent study reports that 5caC form is specifically recognized and excised by TDG (He et al, 2011). The presence of 5hmC at bivalent loci and DNaseI HS might imply that simultaneous removal as well as renewed deposition of 5mC by DNMTs is taking place and that the kinetics of the removal of 5hmC is slow, resulting in accumulation of 5hmC. The presence of TET1 at CGI promoters and the contrasting absence or the very, very low signal of 5hmC (Ficz et al, 2011) might simply be explained by a faster kinetics of 5hmC conversion/removal by BER or TDG than the deposition of 5mC and it conversion to 5hmC.
However, TET1 could also play a regulating function in addition to its enzymatic role. Given the association of TET1 with the Sin3‐complex, TET1 may contribute to repression of transcription. Alternatively, the co‐recruitment of (co)factors to CGI promoters may or may not lead to conversion into and/or accumulation of formyl‐ (fC) and/or caC instead of 5hmC. Proof of the accumulation of fC and/or caC at CGI promoters awaits development of methods for their detection.
An important question is whether readers of 5hmC exist that could shield the 5hmC mark from enzymatic removal and that could translate/interpret its presence into a biological action as can be expected from a ‘true’ epigenetic mark. Affinity capture experiments as performed for the histone methyl mark readers (Vermeulen et al, 2010) will likely uncover whether such 5hmC readers exist. Until then, the role of 5hmC as a new epigenetic mark—the ‘sixth base’—that is read and interpreted by other factors or merely a removal intermediate remains an open question.
We thank Hendrik Marks for the MethylCap data and our colleagues for discussion and critical reading of the manuscript. This work was funded by ZonMW, partners of the ERASysBio+ initiative supported under the EU ERA‐NET Plus scheme in FP7. The hMedip‐, Medip‐ and MethylCap‐seq data reported in this study have been deposited in the NCBI GEO SuperSeries GSE31343.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Tables and Figures [msb201195-sup-0001.pdf]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2011 EMBO and Macmillan Publishers Limited