Development is driven by tightly coordinated spatio‐temporal patterns of gene expression, which are initiated through the action of transcription factors (TFs) binding to cis‐regulatory modules (CRMs). Although many studies have investigated how spatial patterns arise, precise temporal control of gene expression is less well understood. Here, we show that dynamic changes in the timing of CRM occupancy is a prevalent feature common to all TFs examined in a developmental ChIP time course to date. CRMs exhibit complex binding patterns that cannot be explained by the sequence motifs or expression of the TFs themselves. The temporal changes in TF binding are highly correlated with dynamic patterns of target gene expression, which in turn reflect transitions in cellular function during different stages of development. Thus, it is not only the timing of a TF's expression, but also its temporal occupancy in refined time windows, which determines temporal gene expression. Systematic measurement of dynamic CRM occupancy may therefore serve as a powerful method to decode dynamic changes in gene expression driving developmental progression.
Transcriptional networks, which serve as the first point of control for gene expression, encompass large numbers of transcription factors (TFs) that bind to specific DNA motifs within cis‐regulatory modules (CRMs) (Lee et al, 2002). Genome‐wide ChIP studies are beginning to reveal extensive patterns of TF occupancy in a number of developmental contexts, including Drosophila (Sandmann et al, 2006, 2007; Zeitlinger et al, 2007; Li et al, 2008; Liu et al, 2009; MacArthur et al, 2009; Zinzen et al, 2009), mouse (Lagha et al, 2008; Vokes et al, 2008) and fish (Morley et al, 2009). Although this information is an essential first step to decipher developmental networks, a static map of TF binding does not reflect the dynamic nature of gene expression nor predict network behavior. Elucidating the dynamic properties of cis‐regulatory networks is therefore central to understanding how temporal expression states arise.
Regulating the time span of a TF's expression is one obvious means to restrict its activity and therefore regulate dynamic gene expression. This can be achieved through the transient expression of the TF itself at precise stages of development (Azpiazu and Frasch, 1993; Lin et al, 1997; Bruneau et al, 1999; Duan et al, 2001; Goldman and Arbeitman, 2007), facilitating the regulation of sequential cascades of expression or by modulating the half‐life of the TFs mRNA, as suggested in yeast (Jothi et al, 2009). Studies during metazoan development indicate an additional more complex regulatory mechanism where an individual TF can regulate distinct temporal groups of CRMs in more refined temporal windows than their expression suggests (Gaudet and Mango, 2002; Gaudet et al, 2004; Sandmann et al, 2006, 2007; Jakobsen et al, 2007). However, the extent and functional consequence of dynamic CRM occupancy within global developmental networks remains unknown.
A number of computational methods have been used to infer temporal regulation of gene expression, which are mainly based on the specific enrichment of TF motifs in differentially expressed groups of genes (Gao et al, 2004; Das et al, 2006; Wilczyński et al, 2006). Although these approaches can successfully identify some key TFs (Suzuki et al, 2009), global inference of motif activities does not indicate if or when an individual site is occupied, making it difficult to link the timing of a TF's input to the temporal aspect of the target gene's expression. Even with information on motif occupancy, it is difficult to infer regulation as, for example, only ∼50% of TF binding is estimated to be functional in yeast (Gao et al, 2004; Ucar et al, 2009). In higher eukaryotes, the overlap between TF‐binding data and regulated gene expression is even lower, typically in the range of 10–50% (Sandmann et al, 2006, 2007; Vokes et al, 2008; Krejci et al, 2009). Therefore, although TFs occupy a vast array of motifs in vivo, a large fraction of these are either redundant or not functional under the conditions tested. Here, we used a time course of TF binding to more directly assess the prevalence of dynamic TF occupancy on developmental CRMs and asked to what extent temporal CRM occupancy correlates globally with temporal patterns of gene expression and developmental progression.
Results and discussion
In a recent study, we described the integration of ChIP‐on‐chip data for 15 developmental conditions to construct a global atlas of mesodermal CRMs, which was used to predict their spatio‐temporal activity (Zinzen et al, 2009). The five TFs examined (Twist, Tinman, Mef2, Biniou, and Bagpipe) were selected based on their central role within the genetic network controlling Drosophila mesoderm development and their tissue‐specific expression, avoiding confounding signals from other tissues (Thisse et al, 1987; Leptin, 1991; Azpiazu and Frasch, 1993; Lilly et al, 1994; Nguyen et al, 1994; Bour et al, 1995; Zaffran et al, 2001). The ChIP experiments for four TFs (excluding Bagpipe) were conducted in time courses spanning at least three consecutive stages of development. The time points examined cover the first stages of mesoderm development when the cells are pluripotent (2–4 h), through their specification into different muscle primordia (6–8 h) and the initiation of tissue differentiation (8–12 h) (Supplementary Figure 1). These data represent the only genome‐wide time course of TF‐CRM occupancy during embryonic development to date. Here, we used this unique temporal aspect to perform a systematic analysis of the dynamic properties of CRM occupancy during metazoan development.
We first assessed the temporal occupancy of each TF independently, focusing on regions bound by a TF in at least two consecutive time points (see Materials and methods). As these criteria will eliminate CRMs bound at only a single time point, it provides a very stringent set of combinatorially bound modules (Supplementary Table 1). Even within this conservative definition, unsupervised clustering of binding profiles revealed extensive temporal dynamics (Figure 1A), confirming our earlier findings (Sandmann et al, 2006, 2007; Jakobsen et al, 2007), while extending the analysis genome wide. All TFs examined target three broad classes of CRMs with different temporal occupancy (Figure 1A; Supplementary Figure 2): early bound modules, having TF binding during the first two time points for a given TF, but not later, continuous, occupied at all time points (representing about 50% of CRMs bound by a TF) and late, having binding only at the last two time points and not at earlier stages of development. Therefore, each TF occupies ∼50% of its targeted CRMs in a transient manner; being bound either at early or late stages of development (Supplementary Figure 2). Defining transient occupancy is inherently difficult due to potential false negatives, which can lead to a misclassification of continuous binding. However, measuring the quantitative signal for all CRMs revealed very low levels of occupancy on ‘early’ CRMs at late developmental stages and conversely a low‐binding signal for ‘late’ CRMs at early stages of development (Figure 2A; Supplementary Figure 3), demonstrating that transient binding is not an effect of thresholding of the ChIP signal (see Materials and methods).
As these four TFs form part of an interconnected regulatory network driving mesoderm specification, the temporally bound regions for each TF merge into 2813 non‐redundant CRMs (Supplementary Table 1). Investigating the degree of dynamic TF binding on these combinatorially bound CRMs revealed even more dramatic temporal occupancy (Figure 1B). In all, 83% (2353) of developmental regulatory modules show a transient‐binding pattern for at least one of the TFs involved. In some cases, all TFs follow the same temporal pattern, suggesting that the CRM is only accessible at specific stages of development (Figure 1C, CRM #659, #153, #895, #38). For other CRMs, one or two TFs can provide distinct transient inputs, whereas another factor remains constitutively bound, highlighting their inherent complexity of regulation (Figure 1C, CRM #147, #81). This thereby uncovers pervasive temporal regulation within a more refined temporal window than the TFs expression pattern predicts.
To determine how the temporal occupancy patterns arise, we assessed if variation in the binding motifs for the TFs themselves can explain their temporal binding, as shown in Caenorhabditis elegans (Gaudet and Mango, 2002) and yeast (Buck and Lieb, 2006). As TF consensus binding sites are typically based on a small number of known sites from in vitro approaches, they are often poor predictors of TF occupancy. To obtain a more biological representation of the sequence preferences for each TF, we used the large number of in vivo bound regions to optimize the TF‐binding site (TFBS) for each TF using all ChIP‐bound regions (Zinzen et al, 2009) and separately in the early and late‐bound CRMs (Supplementary Figure 4B). As expected from earlier studies (Li et al, 2008), the ChIP‐bound regions are highly enriched in the optimized TFBS for the bound TFs (Figure 2B; Supplementary Figure 4). However, there is no significant difference in the enrichment (Figure 2B; Supplementary Figure 4), or apparent quality (Figure 2C; Supplementary Figures 4B and 5), of the TF‐binding motifs in different temporal classes of CRMs. Assessing the contribution of multiple low quality, presumed weak affinity, sites to the strength of the ChIP signal using a quasi‐biophysical model (TRAP; Roider et al, 2007) revealed a similar trend (Supplementary Figure 6). Therefore, temporal dynamics in TF binding cannot be readily explained by the affinity of the binding sites for the TFs themselves. Conversely, many occupied CRMs contain a high‐quality binding site for a given TF and yet are not bound by that factor (Figure 2C, blue dots; Supplementary Figures 5 and 6). Collectively, these results show that although motif presence is generally necessary, it is not sufficient to predict TF binding. To further confirm this, we examined the enrichment of conserved motifs within different temporal classes of CRMs, as these motifs are more likely to be the functionally relevant sites. The TF motifs are more highly conserved in bound CRMs compared with non‐bound regions (Figure 2d), as reported earlier (Li et al, 2007; Zinzen et al, 2009). However, there is no significant difference in the enrichment of conserved motifs between the temporal classes (Figure 2D). One potential mechanism of temporal CRM occupancy is the cooperative recruitment of TF binding by other TFs, as observed in a number of well‐characterized enhancers (Ip et al, 1992; Woodard et al, 1994; Broadus et al, 1999; Halfon et al, 2000; Li et al, 2000; Lee and Frasch, 2005). Indeed, additional TF motifs are enriched in specific temporal classes of CRMs (Supplementary Table 2). For example, the Twist motif is enriched in Tinman early bound CRMs (P<0.001) but not in late‐bound modules (P=0.77). This differential motif enrichment is in concordance with the high frequency of temporal cobinding by these TFs to the same CRMs (Supplementary Figure 7). In summary, the presence of a high‐affinity TF motif within a CRM is a poor predictor of TF occupancy at any given stage of development, even if the motif is conserved, which has important implications for sequence‐based global CRM predictions and quantitative modeling of CRM activity.
The high degree of temporal TF binding suggests tight regulation, which implies functional importance. To assess this, we examined the relationship between temporal CRM occupancy and temporal patterns of gene expression and developmental progression. As all four factors are transcriptional activators, we compared the timing of TF binding (both on and off) to the time of activation of the associated target genes, using two independent data sets for gene expression. First, using in situ hybridization data from the Berkeley Drosophila Genome Project database (BDGP; Tomancak et al, 2002), which provide an accurate measure of expression timing in the tissue of interest (Figure 3A and B). Second, using a microarray‐based developmental time course with finer temporal resolution, restricting our analysis to tissue‐specific genes (Supplementary Figure 8B). Both approaches show a significant correlation between the timing of TF binding and gene expression: For early bound CRMs, for example, the largest proportion of target genes are activated during early stages of development, reflecting the transient occupancy of their CRMs at these stages (Figure 3A). Similarly, for CRMs that are occupied during mid‐embryogenesis (6–8 h) or later (10–12 h), the majority of their target genes are activated during the respective developmental stages (Figure 3A). This trend holds true for each TF taken separately (Supplementary Figure 8A) and for CRMs occupied very transiently at a single time point, despite potentially containing more noise due to false positives (Supplementary Figure 9). The correlation between dynamic TF binding and the timing of gene expression during development is much higher than what has been observed in yeast (Ni et al, 2009), which is remarkable given that developmental genes typically have multiple CRMs, adding another level of complexity.
We next investigated whether genes with temporally regulated CRMs are linked to developmental progression. If this is the case, they should have different biological functions that reflect cellular processes specific to defined developmental stages. To assess this, we analyzed the enrichment of different functional categories (using GO terms; Ashburner et al, 2000) of genes associated with temporally bound CRMs. In all, 137 GO terms are differentially enriched between genes with early and late‐bound CRMs for one or more TF (see Supplementary Table 3). Many of these (41 classes, Figure 3C) are relevant for mesoderm or muscle development and provide a temporal gene activity map for the development of these tissues. For example, Twist and Tinman bind to CRMs of genes involved in mitosis, apoptosis and chromatin remodeling at very early stages, whereas at later stages of development, these TFs target CRMs of genes essential for gastrulation, transcription and mesoderm morphogenesis (Figure 3C). Similarly, Mef2 and Biniou occupy CRMs of genes involved in mesodermal cell fate commitment and muscle development during mid‐embryogenesis, whereas at later stages, they regulate a battery of genes involved in muscle differentiation, including muscle contractile proteins (Figure 3C). Genes coding for contractile fiber proteins are a striking example of the tight correlation between temporal enhancer occupancy and the timing of target gene function. Of the 21 genes in this functional class, 19 are associated with 69 CRMs from our ChIP atlas. In all, 18 contractile genes (95%) have CRMs bound by Mef2 and/or Biniou exclusively at later stages of development (Figure 3D, upper panel). The expression of these genes is tightly synchronized, initiating at ∼8 h in development, which mirrors the temporal occupancy of their corresponding CRMs (Figure 3D).
As the temporal aspects of TF occupancy have not been explored globally in other developmental contexts, we cannot compare our findings to other metazoans; however, we can compare to global studies of dynamic responses in yeast. Studies examining the response of TFs to different stimuli in yeast revealed two types of TFs: those that remain bound continuously (termed permanent hubs (Luscombe et al, 2004) or condition‐invariant regulators (Harbison et al, 2004)), and TFs that only bind in a specific condition (Luscombe et al, 2004). What we observe within a developmental network is quite different: here, the same TF has the inherent ability to act as a ‘permanent’ or a ‘transient’ factor, depending on the context of the CRM it is interacting with. This is not a rare property for selected hubs, but rather occurs in all TFs examined in time series to date, albeit a small number. Interestingly, the proportions of CRMs in each temporal class are very similar among the four TFs, suggesting that this may be a general feature. More recent studies revealed similar dynamic‐binding patterns for a number of TFs involved in yeast response to high‐salt conditions (Ni et al, 2009) and glucose depletion (Buck and Lieb, 2006). Although different TFs regulated target genes that are significantly functionally different, the correlation between dynamic‐binding patterns of yeast TFs and gene expression or gene function was more elusive (Ni et al, 2009). This comparison suggests that the timing of CRM occupancy may be more tightly coupled to temporal aspects of gene expression in developmental networks, perhaps due to the importance of precise spatio‐temporal expression and the irreversible nature of embryonic development.
The composition of regulatory networks describing developmental progression will naturally change over time, as key TFs often have transient and specific expression. Our results show an additional layer of complexity: TFs occupy CRMs in smaller time windows than their temporal expression predicts. As this differential CRM occupancy is occurring within the same cells at the same stages of development, this is unlikely to be controlled by phosphorylation events; the data rather supports a model of extensive cooperative binding (Broadus et al, 1999), chromatin remodeling (Buck and Lieb, 2006), or a combination of both. Given the conservative manner in which we have classified temporal binding, the degree of dynamic CRM occupancy is likely to be even more pervasive. These temporal changes in network connectivity are highly correlated with temporal patterns of gene expression and most likely have an extensive function in developmental regulatory networks.
The explosive increase in global TF occupancy data in recent years has revealed that TFs bind to thousands (Sandmann et al, 2007; Li et al, 2008) or tens of thousands (Chen et al, 2008; Vokes et al, 2008) of individual sites at any one condition. This raises a new challenge to distinguish which of these binding events are functionally relevant. Here, we show that dynamic TF occupancy is highly correlated with two functional aspects of target gene expression, its timing and gene function, which clearly contrasts to what is expected from random non‐functional sites. Dynamic TF occupancy may therefore provide a mechanism to distinguish functional from non‐functional binding events and can thereby help to decode dynamic changes in gene expression driving developmental progression.
Materials and methods
Defining temporal CRM classes
The ChIP data set contained 8008 CRMs constructed from ChIP peaks measured over 15 developmental conditions (Zinzen et al, 2009): five TFs (Twist, Tinman, Mef2, Bagpipe, and Biniou) measured during five time points (2–4 h, 4–6 h, 6–8 h, 8–10 h, and 10–12 h). To define temporal occupancy, we selected all CRMs bound by at least one TF for at least two consecutive time points. This resulted in a set of 2813 CRMs containing 1102 CRMs bound by Twist, 765 CRMs bound by Tin, 1307 CRMs bound by Mef2 and 462 CRMs bound by Biniou (Supplementary Table 1). Bagpipe was assayed only for a single time point and was therefore removed from the data set. As these TFs constitute a transcriptional hierarchy during mesoderm development, each factor is only expressed during a subset of the whole time series. Therefore, to define temporal CRM classes for Twist (Twi) and Tinman (Tin), the following three time points were used: 2–4 h, 4–6 h and 6–8 h. For Biniou (Bin) and Myocyte enhancer factor 2 (Mef2), the time points 6–8 h, 8–10 h, and 10–12 h were used. CRMs bound during the two earlier times and not at the third were called ‘early’; those bound at the second and third time point but not at the first are called ‘late.’ Remaining CRMs were continuously bound.
Clustering analysis using the quantitative ChIP signal
The quantitative ChIP signal was calculated for each CRM at each experimental condition (TF × time) as a maximum average value of probe intensities (Log2) over a sliding window of size 200 bp (the minimal CRM length). The signals were quantile normalized (Bolstad et al, 2003) to make them comparable between conditions (Supplementary Table 1). The quantitative CRM profiles were clustered using the k‐means method as implemented in the Biopython library (de Hoon et al, 2004).
The Biopython library (Cock et al, 2009) was used to find all occurrences of known Drosophila TFBS motifs (125 motifs from JASPAR (Sandelin et al, 2004) and five PWMs for the TFs examined (Zinzen et al, 2009)). The threshold for motif occurrence was set based on the information content of the position weight matrix (Hertz and Stormo, 1999). Enrichment of a Motif in a CRM class was assessed using P‐values of a binomial distribution estimated from the set of all CRMs (Supplementary Table 2). Motif conservation in a CRM class was measured by average median phastCons (Siepel et al, 2005) score of the best PWM hit in each CRM. NestedMICA (Down et al, 2007) was used to discover motifs de novo in all early and late CRM classes (Supplementary Table 2). Quasi‐biophysical motif‐binding scores for all CRMs were calculated using TRAP (Roider et al, 2007).
Gene expression analysis
To define putative direct target genes, each CRM was assigned to the nearest gene (based on the distance between the middle of a CRM to the nearest gene boundary, 3′ or 5′). In situ annotations were obtained from the BDGP database (Tomancak et al, 2002). For each gene, we assessed its first stage of expression in the mesoderm or muscle, representing the first time point where the gene could be regulated by one of the four TFs examined. The timing of TF binding is significantly globally correlated to the timing of the target gene's expression (bootstraped Kendall P<1e−5). All expression classes show also significant (Fisher exact P<0.05) enrichment for CRMs bound at corresponding time points (Figure 3a). Genes annotated as maternally or ubiquitously expressed or expressed earlier in non‐mesoderm related tissue were excluded to avoid confounding expression patterns, as they are not expected to be regulated by mesodermal CRMs.
Functional annotations using gene ontology
GO term (Ashburner et al, 2000) enrichment for target genes associated with temporal CRM classes was calculated using the Ontologizer software (Grossmann et al, 2007) (annotations were obtained from http://www.GeneOntology.org on 17 September 2009). Terms enriched for at least one class (P<0.05, Benjamini–Hochberg corrected) are listed in Supplementary Table 3. In the interest of clarity, only terms relevant for development are displayed in Figure 3C.
We thank Paul Bertone and Ewan Birney for comments on the manuscript and Przemyslaw Biecek for insightful comments on statistical analyses. We are very grateful to all members of the Furlong laboratory, in particular to Charles Girardot and Mikhail Spivakov, for discussions and critical input. This work was supported by a Human Frontiers Science Program grant awarded to EF.
Conflict of Interest
The authors declare that they have no conflict of interest.
This document contains nine supplementary figures [msb201035-sup-0001.doc]
ChIP‐on‐chip signal for 2,813 CRMs that are bound in two consecutive time‐points [msb201035-sup-0002.xls]
Motif analysis of the 2,813 CRMs [msb201035-sup-0003.xls]
GO term enrichment of temporally regulated target genes [msb201035-sup-0004.xls]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2010 EMBO and Macmillan Publishers Limited