Genes containing multiple pre‐mRNA cleavage and polyadenylation sites, or polyA sites, express mRNA isoforms with variable 3′ untranslated regions (UTRs). By systematic analysis of human and mouse transcriptomes, we found that short 3′UTR isoforms are relatively more abundant when genes are highly expressed whereas long 3′UTR isoforms are relatively more abundant when genes are lowly expressed. Reporter assays indicated that polyA site choice can be modulated by transcriptional activity through the gene promoter. Using global and reporter‐based nuclear run‐on assays, we found that RNA polymerase II is more likely to pause at the polyA site of highly expressed genes than that of lowly expressed ones. Moreover, highly expressed genes tend to have a lower level of nucleosome but higher H3K4me3 and H3K36me3 levels at promoter‐proximal polyA sites relative to distal ones. Taken together, our results indicate that polyA site usage is generally coupled to transcriptional activity, leading to regulation of alternative polyadenylation by transcription.
Transcriptomic and epigenomic data, as well as reporter and nuclear run‐on assays collectively show that transcriptional activity regulates the relative abundance of alternative polyadenylation isoforms, indicating general coupling of 3′ end processing to transcription.
Using RNA‐seq and exon array data for a large number of human and mouse tissues and cells, we identified a general correlation between relative expression of alternative polyadenylation (APA) isoforms and gene expression level: short 3′UTR isoforms are relatively more abundant when genes are highly expressed whereas long 3′UTR isoforms are relatively more abundant when genes are lowly expressed.
Using reporter assays with different promoters, we found that induction of transcription leads to more usage of promoter‐proximal polyA sites, suggesting modulation of 3′ end processing efficiency by transcriptional activity. Global analysis and reporter‐based assays further revealed that regulation of polyA site choice by transcription takes place when genes are regulated under different cell conditions.
Using global and reporter‐based nuclear run‐on assays, we found that highly expressed genes tend to have more RNA polymerase II pausing at promoter‐proximal polyA sites, as compared with lowly expressed genes, supporting the notion that the efficiency of 3′ end processing is coupled to transcriptional activity.
Highly expressed genes have a lower nucleosome level but higher H3K4me3 and H3K36me3 levels at promoter‐proximal polyA sites relative to distal ones, as compared with lowly expressed genes, indicating that transcriptional activity impacts 3′ end processing and regulation of APA leaves epigenetic signatures.
Expression of protein‐coding genes in eukaryotes involves multiple transcriptional and post‐transcriptional processes, which are increasingly found to be interconnected (Maniatis and Reed, 2002; Moore and Proudfoot, 2009). The 3′ end processing of pre‐mRNAs, involving cleavage of nascent RNAs and synthesis of the poly(A) tail (Colgan and Manley, 1997), is critical for termination of transcription and interplays with pre‐mRNA splicing (Buratowski, 2005; Millevoi and Vagner, 2010; Kuehner et al, 2011). A recent study also implicates its role in initiation of transcription (Mapendano et al, 2010). Pre‐mRNA cleavage and polyadenylation, or mRNA polyadenylation, is carried out by the 3′ end processing complex which was recently found to comprise over 85 proteins in human cells (Shi et al, 2009). Interestingly, this complex includes not only the well‐known core polyadenylation factors, or polyA factors for simplicity, such as CPSF, CstF, CF Im, and CF IIm proteins, but also proteins with roles in DNA damage repair, transcription, splicing, translation, etc., underscoring the diverse connections between mRNA polyadenylation and other cellular processes.
Over half of the human genes contain more than one polyA site (Tian et al, 2005; Yan and Marr, 2005), resulting in mRNA isoforms with different protein‐coding regions and/or 3′ untranslated regions (3′UTRs). The pattern of alternative cleavage and polyadenylation, or alternative polyadenylation (APA), of genes is variable across tissues (Zhang et al, 2005; Wang et al, 2008), and is highly regulated during development and when cells change proliferation/differentiation states (Sandberg et al, 2008; Ji et al, 2009). In general, short 3′UTR isoforms resulting from usage of promoter‐proximal polyA sites are relatively more abundant when cells are proliferative, transformed, or undifferentiated (Sandberg et al, 2008; Ji and Tian, 2009; Mayr and Bartel, 2009). Since 3′UTRs contain various cis elements involved in post‐transcriptional gene regulation, such as mRNA localization, translation, and mRNA stability, APA can impact mRNA metabolism and protein expression level in the cytoplasm. Given that the predominant cis elements in 3′UTRs are those controlling mRNA stability, such as AU‐rich elements, GU‐rich elements, and microRNA target sites (Garneau et al, 2007; Vlasova et al, 2008; Bartel, 2009), it is conceivable that APA may have a widespread role in controlling mRNA half‐life. Indeed, shortening of 3′UTRs has been shown to cause increased mRNA stability and higher protein expression for a number of oncogenes (Mayr and Bartel, 2009).
Several mechanisms that regulate APA have been reported. First, modulation of specific polyA factors has been shown to alter polyA site choice. For example, upregulation of CstF64 during B‐cell maturation results in higher usage of a promoter‐proximal polyA site in the IgM heavy chain gene (Takagaki et al, 1996), and knockdown of the 25‐kDa subunit of CF Im was shown to alter APA for a number of genes in HeLa cells (Kubo et al, 2006). Consistently, a general inverse correlation between mRNA expression of polyA factors and global 3′UTR length was found in various tissues during development and in reprogrammed cells (Ji and Tian, 2009), indicating modulation of the general polyadenylation activity may be responsible for APA regulation in cell proliferation/differentiation. Second, various RNA binding proteins (RBPs) have been shown to modulate APA by interacting with cis elements adjacent to the polyA site (Millevoi and Vagner, 2010). An emerging theme is that some RBPs previously known to regulate pre‐mRNA splicing may also have roles in mRNA polyadenylation (Licatalosi and Darnell, 2010).
Third, some proteins with apparent functions in gene transcription have been shown to regulate APA, such as ELL2, an RNA polymerase II (Pol II) elongation factor (Martincic et al, 2009), and Cdc73, a component of the PAF protein complex (PAFc) which associates with Pol II (Rozenblatt‐Rosen et al, 2009). In addition, accumulating evidence suggests mRNA polyadenylation is extensively intertwined with transcription: Pol II itself is an essential polyA factor (Hirose and Manley, 1998) and its C‐terminal domain (CTD) interacts with several other polyA factors and has been implicated in coupling pre‐mRNA processing to transcription (McCracken et al, 1997; Ahn et al, 2004; Meinhart and Cramer, 2004; Adamson et al, 2005; Zhang and Gilmour, 2006); several polyA factors interact with the basal transcriptional machinery (Dantonel et al, 1997) and are present at the promoter region of genes (Venkataraman et al, 2005; Glover‐Cutter et al, 2008); and several transcriptional factors have been shown to regulate 3′ end processing (Rosonina et al, 2003).
Here, we present several lines of evidence indicating that polyA site usage is generally coupled to transcriptional activity, contributing to a global correlation between the relative abundance of 3′UTR isoforms and gene expression level. Given the roles of 3′UTR in mRNA metabolism, this mechanism coordinates transcriptional regulation with post‐transcriptional control via pre‐mRNA processing.
A general correlation between relative abundance of APA isoforms and gene expression level in human and mouse tissues and cells
APA can lead to mRNA isoforms with short or long 3′UTRs (Figure 1A). Using a paired‐end RNA‐seq data set recently released from Illumina (Supplementary Table 1), we examined relative expression of APA isoforms across 16 human tissues. Our method detects 3′UTR length changes based on comparison of the RNA‐seq reads mapped to constitutive and alternative portions of 3′UTR (named cUTR and aUTR, respectively), as defined by the 5′‐most polyA site in the 3′UTR (Figure 1A). We developed a score named relative expression of isoforms using distal polyA sites (RUD) to represent the relative abundance of APA isoforms (see Materials and methods).
Interestingly, we found that highly expressed genes in general tended to express short 3′UTR isoforms more frequently than lowly expressed genes in all examined tissues (Figure 1B; Supplementary Figure 1A). Overall, there were enrichments (∼2‐fold over expected values) of genes with preferential expression of short 3′UTR isoforms when they were highly expressed (lower right corner in Figure 1C) and of genes with preferential expression of long 3′UTR isoforms when they were lowly expressed (upper left corner in Figure 1C). Conversely, there were depletions (∼3‐fold below expected values) of genes with opposite trends (upper right and lower left corners in Figure 1C). A similar result was obtained by analyzing transcriptomes in 10 human tissues based on the single‐end RNA‐seq method (Pan et al, 2008; Wang et al, 2008; Supplementary Figure 1B and C).
We next analyzed two Affymetrix exon array data sets corresponding to 62 types of human primary cells and cell lines and 54 types of mouse tissues and cell lines (Supplementary Table 1), respectively. It is worth noting that data from Affymetrix exon arrays allow strand‐specific analysis of gene expression, whereas the RNA‐seq reads analyzed do not intrinsically reveal the strand information of transcription. This might be important for our analysis since previous reports have indicated that antisense transcripts are pervasive in human cells (He et al, 2008) and are involved in regulation of gene expression (Xu et al, 2011). Using RUD scores derived from microarray probe intensities (see Materials and methods and Figure 1A), we found that the connection between APA and gene expression is consistent with, but more obvious than, that observed using RNA‐seq data, as indicated by gene enrichment and depletion values (Figure 1D and E). Therefore, we conclude that there is a general correlation between the relative abundance of 3′UTR isoforms and gene expression level.
Reporter assays confirmed regulation of polyA site choice by transcriptional activity
Difference in relative abundance between APA isoforms can be attributable to two potential mechanisms: (1) 3′UTR isoforms have different mRNA stabilities and/or (2) isoforms are differentially produced resulting from alternative 3′ end processing. The former mechanism has been shown for a number of genes (Edwalds‐Gilbert et al, 1997; Mayr and Bartel, 2009). Thus, we wanted to know whether the latter could have a role in the correlation between relative abundance of APA isoforms and gene expression level. To this end, we made a set of constructs in which the same reporter gene, capable of expressing two APA isoforms, was under the control of different promoters (Figure 2A). This experimental design enables analysis of the effect of transcription on APA while minimizing the influence of mRNA stability. Using RNase protection assays (RPAs), we first confirmed the two expected APA isoforms (Figure 2B). In addition, we found that the CMV promoter (PCMV) resulted in much more (∼9‐fold) expression of the short isoform relative to the long one than a basal TATA‐like promoter (PTAL) (Figure 2B), indicating that polyA site choice can be controlled by promoter sequences.
Source data for Figure 2B [msb201169-sup-0001-SourceData-S1.txt]
Source data for Figure 2C [msb201169-sup-0002-SourceData-S2.txt]
Source data for Figure 2D [msb201169-sup-0003-SourceData-S3.txt]
Source data for Figure 2F [msb201169-sup-0004-SourceData-S4.txt]
We then used several inducible promoters and examined their regulation of polyA site choice under induced or non‐induced conditions. Using quantitative reverse transcription PCR (qRT–PCR) and constructs containing the cAMP response element (PCRE) or the NFκB binding site (PNFκB) in the promoter, which are inducible by forskolin or TNFα, respectively, we found that induction of PCRE and PNFκB led to conspicuously higher expression of the reporter gene, as expected, and more usage of the promoter‐proximal polyA site (Figure 2C). In contrast, induction of PNFκB by forskolin or PCRE by TNFα had no effect on gene expression nor polyA site usage, indicating specificity of the regulation (Figure 2C). This result was also confirmed by fluorescence activated cell sorting analysis (Supplementary Figure 2). When the results for PTAL and PCMV were included, a good inverse correlation between gene expression level and usage of the distal polyA site was discerned (R2=0.87). Similar trends were also observed for constructs containing promoters with the glucocorticoid response element (PGRE), the heat‐shock response element (PHSE), and the serum response element (PSRE) (Figure 2D). Thus, our reporter assays confirmed that short APA isoforms using promoter‐proximal polyA sites are more likely to be produced when gene expression is induced, suggesting that polyA site usage is coupled to transcriptional activity.
We next wanted to know whether the transcription‐coupled APA can be detected when cells change conditions in response to developmental and environmental signals, under which the global APA pattern can change (Ji and Tian, 2009). To this end, we first reanalyzed our previously published exon array data for mRNAs expressed in murine C2C12 myoblasts, in which genes tend to express long 3′UTR isoforms more frequently when cells switch from proliferation to differentiation (Ji et al, 2009). Consistent with the notion that polyA site usage is coupled to transcriptional activity, we found that upregulated genes were less likely to have 3′UTRs lengthened than downregulated ones (Figure 2E). This phenomenon was also observed when we analyzed data for breast cancer cell lines versus normal breast tissues and TNFα‐treated lymphoblastoid cells versus non‐treated ones (Supplementary Figure 3). In both cases, upregulated genes were more likely to express short 3′UTR isoforms than downregulated ones regardless of the overall trend of APA regulation in the cell. To validate this global analysis result, we made a construct containing a promoter with the E2F binding site (PE2F) and a construct containing the myogenin promoter (PMyoG). PE2F and PMyoG have been shown to be inhibited and activated, respectively, during C2C12 differentiation (Edmondson et al, 1992; Blais et al, 2005). In complete agreement with our global analysis result, more expression of the long 3′UTR isoform relative to the short one was observed for the construct containing PE2F in differentiating C2C12 cells as compared with proliferating ones; and the construct containing PMyoG showed the opposite trend (Figure 2F). Therefore, we conclude that modulation of polyA site choice by transcriptional activity can happen to genes with regulated expression when cells respond to developmental and environmental signals.
To address whether our reporter assay results could be generalized, we made another construct (pTRE‐RIF) containing a different reporter gene and the tetracycline response element (TRE) in the promoter (Figure 3A), which can be activated by doxycycline (Dox) in HeLa Tet‐On cells. Consistent with the results described above, activation of transcription resulted in higher gene expression and more usage of the proximal polyA site, as indicated both by luciferase assays (Figure 3B) and by qRT–PCR (Figure 3C). Since activation of transcription through TRE is mediated by the transcription activation domain from HSV VP16 (Figure 3A), this result also suggests that transcription factors may have an important role in the regulation of polyA site choice by transcriptional activity (see Discussion for more on this point).
Source data for Figure 3B [msb201169-sup-0005-SourceData-S5.txt]
Source data for Figure 3C [msb201169-sup-0006-SourceData-S6.txt]
Source data for Figure 3D [msb201169-sup-0007-SourceData-S7.txt]
Source data for Figure 3E [msb201169-sup-0008-SourceData-S8.txt]
Source data for Figure 3F [msb201169-sup-0009-SourceData-S9.txt]
Nuclear run‐on data support coupling of 3′ end processing to transcription
RNA polymerase II (Pol II) pauses at the polyA site for 3′ end processing (Nag et al, 2007; Glover‐Cutter et al, 2008; West and Proudfoot, 2009), which can be detected by the nuclear run‐on (NRO) method (Core et al, 2008; West and Proudfoot, 2009). We reasoned that if polyA site usage was coupled to transcriptional activity, Pol II pausing at the polyA site would be different for genes expressed at different levels. To this end, we carried out NRO with pTRE‐RIF under low and high induction conditions using BrUTP to label nascent transcripts. Labeled nascent transcripts were immunoprecipitated and analyzed by qRT–PCR with primer sets targeting different regions of the reporter gene (Figure 3A). Consistent with the difference in total RNA expression, qRT–PCR of NRO RNA showed more nascent transcripts from the reporter gene when it was highly induced than when it was lowly induced (Figure 3D).
The qRT–PCR value for a given region also represents Pol II density in the region, reflecting its pausing kinetics. As shown in Figure 3E, we found that Pol II density was generally higher across the whole reporter gene when it is highly induced. A prominent peak can be discerned at the proximal polyA site under both induction conditions, suggesting significant pausing of Pol II in the region. Importantly, the difference in Pol II density at the proximal polyA site between high and low induction conditions is significantly greater than those at other regions (Figure 3F). Combined with the luciferase results and qRT–PCR data using total RNA, the NRO results indicate that Pol II pausing at the polyA site (1) correlates with polyA site usage and (2) can be modulated by transcriptional activity. Since NRO is not affected by mRNA stability, a common issue in analysis of steady‐state mRNAs, this finding directly supports the notion that polyA site usage is coupled to transcriptional activity.
To examine regulation of 3′ end processing by transcription for endogenous genes, we carried out a global NRO experiment using deep sequencing (GRO‐seq), similar to the method developed by the Lis group (Core et al, 2008). We obtained over 12.5 million uniquely mapped strand‐specific GRO‐seq reads for the nascent transcripts generated in NRO of C2C12 cells (see Materials and methods for details). We first examined genes with only one polyA site in the 3′‐most exon (named single polyA site, Figure 4A). Consistent with the findings reported by the Lis group (Core et al, 2008), we observed two GRO‐seq read peaks around the polyA site (Figure 4B): one spanned the 1‐kilobase (kb) upstream region of the polyA site, peaking right before the polyA site; and the other spanned the 4‐kb downstream region. As indicated previously (Core et al, 2008), the first peak corresponds to Pol II pausing at the polyA site, or polyA pausing, and the second one corresponds to Pol II pausing before termination, or pre‐termination pausing. Using GRO‐seq read density in the transcribed region to represent gene expression level (the first 1 kb region at the 5′ end was excluded to minimize influence of reads resulted from Pol II pausing at the promoter), we found that polyA pausing was more pronounced for highly expressed genes than for lowly expressed ones (Figure 4B and PA/GB in Figure 4C). This trend was also discernable for the ratio of polyA pausing to pre‐termination pausing (PA/PT in Figure 4C). Interestingly, we found a shift of the pre‐termination pausing peak toward the polyA site as gene expression level increased (see arrows in Figure 4B, and comparison of the first and second halves of the pre‐termination pausing region in Figure 4C), suggesting that Pol II may terminate more rapidly on highly expressed genes than on lowly expressed ones.
We next examined GRO‐seq reads around alternative polyA sites. We focused on the 5′‐most and 3′‐most polyA sites in the 3′‐most exons (Figure 4D). Consistent with the notion that the promoter‐proximal polyA site is more likely to be used when gene expression is high, both Pol II pausing at the proximal polyA site (PA(proximal)/GB in Figure 4E) and the ratio of pausing between proximal and distal polyA sites (PA(proximal)/PA(distal) in Figure 4E) were much greater for highly expressed genes than for lowly expressed ones. Since measurement of Pol II pausing at the distal site may be affected by pre‐termination pausing resulted from usage of upstream polyA sites, we calculated the ratio of pre‐termination pausing after the distal polyA site to gene body. Interestingly, the ratio was significantly lower for highly expressed genes than for lowly expressed ones (PT(distal)/GB in Figure 4E), suggesting a greater loss of Pol II before reaching the distal polyA site for highly expressed genes. Taken together, our GRO‐seq results based on endogenous genes further indicate that Pol II is more likely to pause at proximal polyA sites when genes are highly expressed. Notably, consistent results were also obtained using the data published by the Lis group, which was based on a human cell line (Supplementary Figure 4).
Regulation of APA by transcriptional activity correlates with differences in nucleosome positioning and histone methylations around alternative polyA sites
Transcriptional activity is known to impact epigenetic features such as nucleosome positioning and histone modifications (Barski et al, 2007; Campos and Reinberg, 2009; Schwartz et al, 2009). We reasoned that alteration of epigenetic features at alternative polyA sites would indicate change of transcriptional activity at these sites, supporting alternative polyA site usage. Since epigenetic features are not affected by RNA metabolism, such as stability, the result would address the connection between polyA site usage and transcriptional activity from a different perspective.
We first analyzed a data set generated by digestion of chromatin DNA from human resting T cells with micrococcal nuclease (Schones et al, 2008). Consistent with previous reports (Mavrich et al, 2008; Kaplan et al, 2009; Spies et al, 2009), depletion of nucleosome around the polyA site was detected for both proximal and distal polyA sites (Figure 5A). Interestingly, this depletion could be predicted using a computational model that was based solely on nucleotide content (Kaplan et al, 2009; Figure 5A, bottom), indicating important contribution of nucleotide composition to nucleosome positioning around the polyA site. However, highly expressed genes had a lower nucleosome level around the polyA site than lowly expressed genes, exceeding the difference predicted by the computational model, particularly in regions >200 nucleotides (nt) upstream or downstream of the polyA site. This result indicates transcriptional activity has an additional impact on nucleosome level around the polyA site. Importantly, highly expressed genes had a lower ratio of nucleosome level between proximal and distal polyA sites (upstream 500 nt used for analysis) than lowly expressed genes (Figure 5B). Since this difference could not be discerned using predicted nucleosome levels (Figure 5C), the nucleosome level difference between alternative polyA sites in genes expressed at different levels supports the notion that polyA site choice is connected to transcriptional activity.
To examine how regulation of APA by transcriptional activity correlates with changes in histone methylation levels, we analyzed a ChIP‐seq data set for human resting T cells. As expected, H3K36me3 and H3K4me3 levels around the polyA site correlated with gene expression level (Figure 5D), although there are differences in their profiles. The H3K36me3 profiles showed a drop at the polyA site, presumably attributable to depletion of nucleosome. Significantly, the ratio of H3K36me3 level in the upstream region of proximal polyA site (within 500 nt) to that of distal site was significantly greater for highly expressed genes than that for lowly expressed ones (Figure 5E). This pattern was also corroborated by analysis of H3K36me3 levels in mouse embryonic fibroblasts (MEFs) and neuronal progenitor cells (NPCs), but was not discernable for H3K36me1 (Supplementary Figure 5).
Similar to the H3K36me3 result, the ratio of H3K4me3 level in the upstream region of proximal polyA site to that of distal site was significantly higher for highly expressed genes as compared with lowly expressed ones (Figure 5F). Interestingly, a marked drop of H3K4me3 level after the proximal polyA site can be discerned for highly expressed genes. The H3K4me3 result was also confirmed by using data for MEF and NPC (Supplementary Figure 5). A similar trend was found for H3K4me2, albeit less significant, but not for H3K4me1 (Supplementary Figure 5). Taken together, our results from analysis of epigenetic features indicate polyA site choice is connected to transcriptional activity and leaves epigenetic signatures.
Here, we report a general correlation between the relative abundance of APA isoforms and gene expression level in human and mouse transcriptomes, and present several lines of evidence indicating that transcriptional activity regulates polyA site choice. The correspondence between APA and gene expression revealed in our study may be responsible for the coupled usage of alternative promoters and polyA sites previously reported for some genes (Costessi et al, 2006; Winter et al, 2007), and contribute to tissue‐specific and condition‐specific APA events involving transcription factors, such as neuronal activity‐dependent polyA site selection mediated by MEF2 (Flavell et al, 2008).
Regulation of polyA site choice by transcriptional activity results in preferential expression of long 3′UTR isoforms when gene expression is low and short 3′UTR isoforms when gene expression is high. Since short 3′UTRs are generally more stable due to avoidance of destabilizing elements in 3′UTRs (Mayr and Bartel, 2009) and escape from cellular mechanisms degrading long 3′UTRs (Hogg and Goff, 2010), more frequent production of short 3′UTR isoforms would make the overall expression level of a gene even higher and, conversely, more frequent production of long 3′UTR isoforms would have the opposite effect. Therefore, this mechanism can magnify the final outcome of transcriptional regulation (Figure 6): high transcriptional activity leads to more expression of mRNA isoforms with long half‐lives and therefore high protein production capabilities; and low transcriptional activity leads to more expression of isoforms with short half‐lives, and therefore low protein production capabilities. Conceivably, this mechanism can facilitate swift gene expression changes, which can be critical when cells respond to developmental and environmental signals.
Our result also suggests that downregulation of gene expression at the transcriptional level may lead to more unprocessed pre‐mRNAs as a result of less usage of polyA site (Figure 6). This could lead to further inhibition of gene expression, because unprocessed pre‐mRNAs are subject to degradation by the nuclear exosome (Houseley et al, 2006; Lykke‐Andersen et al, 2009). On this note, a global analysis of RNA expression by tiling arrays indicated that most regions of the human genome can be transcribed (Johnson et al, 2005). However, intergenic transcripts outside the well‐annotated genes generally have very low abundance (van Bakel et al, 2010). It is possible that transcription of some of these RNA species may result from the non‐specific activity of Pol II, which, due to lack of coupling to 3′ end processing, leads to unprocessed transcripts that are rapidly degraded. Therefore, coupling 3′ end processing to transcription may have a role in augmenting the difference between specifically activated transcription and transcriptional noise in the cell.
A number of potential molecular mechanisms need to be considered to explain regulation of polyA site choice by transcriptional activity. First, transcription elongation, which can be controlled by gene promoters/enhancers, has been shown to regulate pre‐mRNA splicing (Kornblihtt, 2007). However, unless the elongation rate constantly correlates with transcriptional activity, it is not likely to be a general mechanism responsible for global regulation of polyA site choice at different gene expression levels. Second, previous studies have shown that the CTD of Pol II is critical for coupling pre‐mRNA processing to transcription (McCracken et al, 1997; Ahn et al, 2004; Meinhart and Cramer, 2004; Adamson et al, 2005). Phosphorylation of CTD is important for recruitment of polyA factors to the 3′ end of genes (Ahn et al, 2004). It remains to be seen, however, whether highly expressed genes have different CTD phosphorylation patterns than lowly expressed ones, leading to differential recruitment of polyA factors at the 3′ end.
Third, transcription factors can regulate mRNA polyadenylation. For example, some transcription factors were shown to stimulate PSF (Rosonina et al, 2005), which has a role in 3′ end processing and termination of transcription (Liang and Lutz, 2006; Kaneko et al, 2007). Notably, Nagaike et al (2011) have recently provided biochemical evidence in vitro that transcription activators stimulate transcription‐coupled 3′ end processing through direct interaction with the PAFc, which was previously shown to be involved in 3′ end formation of yeast mRNAs and snRNAs (Penheiter et al, 2005; Sheldon et al, 2005), and to bind CPSF and CstF (Rozenblatt‐Rosen et al, 2009). Interestingly, PAFc has also been shown to regulate histone methylation (Zhu et al, 2005; Jaehning, 2010), which appears consistent with our finding that the relative levels of H3K4me3 and H3K36me3 around alternative polyA sites are distinct for genes expressed at different levels. Therefore, it is plausible that the correspondence between APA and gene expression we observed here is due to recruitment of polyA factors to Pol II at the promoter region by transcription factors. This mechanism would make Pol II more ‘prepared’ to engage in 3′ end processing when a polyA site is encountered. Future studies need to further address whether different transcriptional factors involve different mechanisms in polyA factor recruitment.
The correlation between expression level and differences in the patterns of nucleosome positioning and histone methylation (H3K4me3 and H3K36me3) around alternative polyA sites not only supports the notion that polyA site choice is regulated by transcriptional activity but also raises the question as to whether epigenetic features can, in return, modulate APA, thereby forming a feedback regulatory mechanism. Histone methylation has been shown to facilitate pre‐mRNA splicing by recruiting the spliceosome (Sims et al, 2007) and to regulate alternative splicing by affecting splicing regulators (Luco et al, 2010). DNA methylation, another type of epigenetic mark, has been shown to regulate APA in imprinted genes (Wood et al, 2008). It will be interesting in the future to examine how nucleosome remodeling and different types of histone methylation have roles in APA and its coupling to transcription.
Materials and methods
Analysis of RNA‐seq data
Paired‐end reads were mapped to hg18 genome using Tophat (Trapnell et al, 2009) with default parameters. Only uniquely mapped and properly paired reads were used for subsequent analyses. Properly paired reads are those with two pairing reads mapped to different strands of the same chromosome. See Supplementary Table 2 for the number of reads used. Gene expression levels were calculated using read density in the protein‐coding region based on the reads per kilobase of mappable region per million mapped reads (RPKM) method (Mortazavi et al, 2008). A cutoff of RPKM=1 was used to select expressed genes, which resulted in similar numbers of expressed genes in different tissues to those reported by Ramskold et al (2009). The top 25%, middle 50%, and bottom 25% of expressed genes with respect to RPKM were considered as having high, medium, and low expression, respectively. The score for relative expression of isoforms using distal polyA sites (RUD) was based on the ratio of read density in aUTR to that in cUTR (Figure 1A). Therefore, a high RUD value indicates higher abundance of long 3′UTR isoform resulting from usage of promoter‐distal polyA sites relative to short 3′UTR isoforms resulting from usage of promoter‐proximal polyA sites. The transcription direction for each read pair was inferred based on overlap with RefSeq sequences and the 3′ end of the mapped read pair was used for RUD calculation. We required the 3′UTR of a surveyed gene do not overlap with any regions of other genes, regardless of transcription direction. This can minimize interference from antisense transcripts.
Analysis of exon array data
Exon array data were normalized by the Robust Multichip Average (RMA) method and were corrected for hybridization bias using the COSIE program (Gaidatzis et al, 2009). Expressed genes were selected by the Detection Above Background (DABG) method. Gene expression level was calculated using probesets in constitutive exons, based on NCBI cDNAs/ESTs (Lee et al, 2008). The RUD score was based on the ratio of average probeset intensity in aUTR to that in cUTR, as previously described (Ji et al, 2009). For both human and mouse data sets, we first calculated mean for each probeset across replicates and then combined all samples for analysis. For gene expression regulation in differentiation of C2C12 cells, 1 × standard deviation of log2(ratio of expression) was used to group differentially expressed genes. APA regulation was based on analysis of RUD values using the Significance Analysis of Microarray (SAM) method with FDR<0.05 as cutoff for selection of significant events, as previously described (Ji et al, 2009).
Analysis of GRO‐seq data
GRO‐seq reads for C2C12 cells were mapped to the mouse genome (mm9) using Bowtie (Langmead et al, 2009) allowing up to three mismatches. Unmapped reads were trimmed to the first 38 nt and mapped to the genome again by Bowtie allowing up to three mismatches. This approach resulted in 12 511 052 uniquely mapped reads (69% of total). The 5′ end position of each read was used to indicate its location. GRO‐seq reads were examined in four regions of a gene: (1) 5′ region, 1 kb downstream of the transcription start site (TSS); (2) polyA region (PA), 1 kb upstream of the polyA site; (3) gene body region (GB), whole gene region excluding 5′ and PA regions; (4) pre‐termination region (PT), 4 kb downstream of the polyA site. When there were multiple polyA sites in the 3′‐most exon, the 5′‐most site was used to define GB. Gene expression level was based on read density (RPKM) in GB and PA. RPKM>0.04 was used as cutoff to select expressed genes. The top 25%, middle 50%, and bottom 25% of expressed genes with respect to RPKM were considered as having high, medium, and low expression, respectively. To minimize interference from downstream genes, we selected only genes that were not followed by any RefSeq‐supported genes with the same transcriptional direction in the 6‐kb downstream region. For analysis of alternative polyA sites, we selected only the distal polyA sites that were not preceded by any polyA sites in the upstream 400 nt region and used only the upstream 200 nt region of proximal or distal polyA sites. The same method was used to analyze the GRO‐seq data generated by the Lis group for IMR90 cells, except that the PT region was 3 kb.
Analysis of nucleosome positioning and histone methylation patterns
The data for nucleosome positioning and histone methylation were from Schones et al (2008) and Barski et al (2007), respectively. For the nucleosome positioning data, we extended reads to 147 bp, the length of DNA bound by a full nucleosome. Gene expression levels were based on microarray data of human T cells using the Affymetrix MAS5 method. The bottom and top 20% of genes with respect to probe intensity values were considered as lowly and highly expressed genes, respectively. The ratio of read density in the 500‐nt upstream region of proximal polyA site to that of distal site was calculated to indicate relative level at the proximal versus distal polyA sites. Nucleosome level prediction using nucleotide content was based on a program from http://genie.weizmann.ac.il/software/nucleo_genomes.html (Kaplan et al, 2009). Histone modification patterns in MEFs and NPCs were analyzed by the same method.
To compare various features between highly and lowly expressed genes, we used a data resampling method based on bootstrapping (Venables and Ripley, 2002). This was carried out by resampling genes in two comparing groups 1000 times to derive a P‐value based on how many times one group has a higher or lower mean value than the other. This method was also used to derive 90% confidence intervals for various data.
Gene density plot
We used the gene density plot to examine the correlation between the relative abundance of 3′UTR isoforms and gene expression level. As described above, the RUD score was used to represent the relative abundance of APA isoforms. In the gene density plot, genes expressed in each sample were summarized in an N × N table according to the RUD rank (row) and gene expression rank (column) across a sample set, where N is the number of samples in the set. The number of genes in each cell of the table is an observed value. The expected number of genes for each cell was calculated using randomized data with shuffled RUD and gene expression ranks. The ratio of observed number of genes (Obs) to expected number of genes (Exp) of a cell indicates the extent of enrichment (when ratio >1), or of depletion (when ratio <1), of genes in the cell; and the log2(ratio) values of the table are represented in a heat map using R.
Constructs used in this study
Constructs containing PTAL, PCRE, PNFκB, PGRE, PHSE, and PSRE were constructed by replacing the CMV promoter (PCMV) in pRiG‐77S.AD (Ji et al, 2009) with fragments containing promoter sequences from corresponding constructs included in the Mercury™ Pathway Profiling System (Clontech) by PCR (primers: 5′‐GCGCATTAATGAGCTCTTACGCGTTCTAGC and 5′‐CGATGCTAGCCGATTCGAAGCTTCTGCTTC) and restriction enzymes AseI and NheI. Constructs containing PE2F and PMyoG were constructed by replacing the CMV promoter in pRiG‐77S.AE (Ji et al, 2009) with respective DNA fragments and restriction enzymes AseI and NheI. PMyoG was derived from the myogenin promoter (−395 to +39 nt surrounding the TSS; Edmondson et al, 1992) using PCR (primers: 5′‐CGATATTAATGGATTTTCAAGACCCCTTCC and 5′‐GGCCGCTAGCAAGGCTTGTTCCTGCCACT) and C2C12 genomic DNA, and PE2F was derived from the E2F luciferase reporter vector (Panomics) using PCR (primers: 5′‐CGATATTAATCTAGCCTTGGCGGGAGATA and 5′‐GGCCGCTAGCTTACCAACAGTACCGGAATGC). The pTRE‐RIF vector was constructed by cloning a fragment encoding the Renilla luciferase from pRL‐CMV (Promega), a fragment encoding the firefly luciferase from pGL3‐Basic (Promega), and a fragment containing a polyA site and an IRES sequence from pRiG‐77S.AD into the pTRE‐Tight vector (Clontech).
Cell culture and reporter assays
Experiments using constructs containing PCMV, PTAL, PCRE, PNFκB, PGRE, PHSE, or PSRE were carried out in Human Embryonic Kidney (HEK) 293 cells, which were maintained in Dulbecco's Modified Eagles Medium (DMEM) supplemented with 10% fetal bovine serum (FBS). Transfection was carried out using Lipofectamine 2000 (Invitrogen). The following conditions were used to induce PCRE, PNFκB, and PGRE: 5 μM forskolin (Fisher) for PCRE, 0.1 μg/ml human TNFα (Sigma) for PNFκB, and 5 μM Dexamethasone (Sigma) for PGRE. Cells were treated with inducing agents 16 h after transfection. Six hours after treatment, total cellular RNA was extracted using TRIzol (Invitrogen). To induce PHSE, cells were incubated at 45°C for 15 min. To induce PSRE, cells were first grown in the reduced serum media Opti‐MEM (Invitrogen) and then in DMEM with 10% FBS.
Constructs containing PE2F or PMyoG were studied in proliferating and differentiating C2C12 cells. Briefly, C2C12 cells were maintained at 30–70% confluency in DMEM supplemented with 10% FBS. Transfection was carried out using Lipofectamine 2000 (Invitrogen) when the confluency of cells was 30%. After 16 h, cells were split into a proliferation group and a differentiation group. For the proliferation group, cells were still maintained in DMEM with 10% FBS. For the differentiation group, cells were switched to DMEM with 2% horse serum when they reached 90% confluency. qRT–PCR was carried out 24 h after transfection.
pTRE‐RIF was studied in HeLa Tet‐On cells (gift from Andrew Harris, UMDNJ). Briefly, cells were maintained in DMEM supplemented with 10% FBS and 200 μg/ml G418. Transfection of pTRE‐RIF was carried out using jetPEI (Polyplus‐transfection). Cells were treated with different doses of Dox (Sigma) right after transfection. After 24 h, cell lysis and luciferase assay were carried out using the Dual‐Luciferase Reporter Assay System (Promega).
RNase protection assay
The DNA template for RPA probe was produced by putting a fragment surrounding the proximal polyA site of pRiG‐77S.AD into the pcDNA3.1 vector (Invitrogen) using restriction enzymes (XhoI and BamHI). The 32P‐labeled antisense RNA probe was generated by in vitro transcription using the MAXIscript kit (Ambion). RPA assays were carried out using the RPA III kit (Ambion).
Quantitative real‐time reverse transcription PCR
For qRT–PCR, total cellular RNA was treated with DNase I and reverse transcribed using the oligo‐dT primer. qRT–PCR was carried out using the Maxima SYBR Green/Rox qPCR Master Mix (Fermentas) with primers targeting RFP (5′‐GCCCCGTAATGCAGAAGAAG and 5′‐CTTCAGGGCCTTGTGGATCT), EGFP (5′‐GGGCACAAGCTGGAGTACAACT and 5′‐ATGTTGTGGCGGATCTTGAAG), or the kanamycin resistance gene (5′‐GCCGAATATCATGGTGGAAA and 5′‐AATATCACGGGTAGCCAACG).
NRO assays using BrUTP to label newly synthesized RNA were based on the methods developed by the Lis group (Core et al, 2008) and the Fu group (Lin et al, 2008). Briefly, ∼1 × 107 cells were washed three times in cold PBS on ice, followed by incubation in 10 ml ice‐cold swelling buffer (10 mM Tris–HCl pH 7.5, 3 mM CaCl2, 2 mM MgCl2) for 5 min. Cells were collected with a scraper and pelleted with 500 g at 4°C for 10 min. Nuclei were isolated by pipetting cells up and down 20 times using a cut P1000 tip in 1 ml lysis buffer (swelling buffer with 0.5% NP‐40, 10% glycerol, 20 U/ml RNasin). Nuclei were washed and pelleted in the lysis buffer, resuspended in 1 ml freezing buffer (50 mM Tris–HCl pH 8.3, 40% glycerol, 5 mM MgCl2, 0.1 mM EDTA). Nuclei were pelleted again with 1000 g at 4°C for 5 min, and resuspended in 100 μl of freezing buffer. An equal volume (100 μl) of reaction buffer (10 mM Tris–HCl pH 8.0, 5 mM MgCl2, 1 mM DTT, 300 mM KCl, 20 U of RNase inhibitor, 1% sarkosyl, 0.5 mM of BrUTP, ATP, and GTP, and 5 μM CTP) was added to carry out NRO at 30°C for 5 min.
NRO reporter assay
For NRO of reporter constructs, transfected HeLa Tet‐On cells were treated with 10 ng/ml or 10 μg/ml Dox. After 16 h, cells were harvested for NRO as described above. The NRO reaction was stopped by adding Trizol into the reaction mix. RNA was then extracted, and treated with DNase I to remove DNA. Newly synthesized labeled RNA was pulled down by anti‐BrdU antibody conjugated to the protein G Dynabeads (Invitrogen) in the binding buffer (10 mM Tris–HCl pH 7.4, 500 mM NaCl, 2.5 mM MgCl2, 0.5% Triton X‐100, 0.5 μg/μl yeast RNA, 0.1 μg/μl yeast tRNA) at 4°C for 2 h. After immunoprecipitation (IP), beads were washed with the washing buffer (10 mM Tris–HCl pH 7.4, 500 mM NaCl, 2.5 mM MgCl2, 0.5% Triton X‐100) six times. RNA was eluted from the beads by the RLT buffer (Qiagen) supplemented with yeast RNA (0.5 μg/μl). RNA was then purified by the RNeasy kit (Qiagen) and used for qRT–PCR with random hexamers as primer for the RT step. Primers used for qRT–PCR of NRO transcripts are shown in Supplementary Table 3.
C2C12 cells grown to ∼80% confluency in DMEM+10% FBS were used for NRO, as described above. After NRO, the reaction mix was treated with 50 U DNase I at 37°C for 1 h followed by incubation at 55°C for 1 h with an equal volume of buffer S containing 20 mM Tris–HCl pH 7.4, 2% SDS, 10 mM EDTA, 200 mg/ml Proteinase K. RNA was extracted by phenol/chloroform twice, followed by ethanol precipitation. Purified RNA was fragmented to ∼100 nt by the RNA Fragmentation Reagents kit (Ambion) at 70°C for 15 min, and was subjected to IP with the anti‐BrdU antibody (2 mg/reaction, Sigma) conjugated on the protein G Sepharose (GE Healthcare). Immunoprecipitated RNA was extracted by phenol/chloroform and ethanol precipitation. The IP step was repeated twice to obtain pure nascent RNA. Purified nascent RNA was treated with shrimp alkaline phosphatase (Roche) at 1 U/reaction at 37°C for 30 min to remove the 3′ phosphate group from RNA, followed by addition of the 5′ phosphate group with T4 kinase (NEB) at 10 U/reaction at 37°C for 1 h. Treated RNA was purified and used to prepare a sequencing library using the Illumina Small RNA Sample Prep Kit (v1.5). Deep sequencing was carried out on an Illumina Genome Analyzer IIx, which generated over 12.5 million uniquely mapped single reads (76 nt).
We thank Gary P Schroth at Illumina for providing RNA‐seq data and James L Manley, Takashi Nagaike, Dinghai Zheng, Michael B Mathews, and Carol S Lutz for helpful discussions. This work was funded by a grant from NIH (R01 GM084089) to BT.
Author contributions: ZJ, WL (Luo), and BT conceived and designed research and wrote the paper; ZJ, WL (Luo), WL (Li), MH, ZP, and YZ performed research; ZJ, WL (Luo), WL (Li), and BT analyzed data.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Figures S1–5, Supplementary Tables S1–3 [msb201169-sup-0001.pdf]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2011 EMBO and Macmillan Publishers Limited