Despite the availability of several large‐scale proteomics studies aiming to identify protein interactions on a global scale, little is known about how proteins interact and are organized within macromolecular complexes. Here, we describe a technique that consists of a combination of biochemistry approaches, quantitative proteomics and computational methods using wild‐type and deletion strains to investigate the organization of proteins within macromolecular protein complexes. We applied this technique to determine the organization of two well‐studied complexes, Spt–Ada–Gcn5 histone acetyltransferase (SAGA) and ADA, for which no comprehensive high‐resolution structures exist. This approach revealed that SAGA/ADA is composed of five distinct functional modules, which can persist separately. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 histone acetyltransferase complexes. Finally, we propose a model for the architecture of the SAGA and ADA complexes, which predicts novel functional associations within the SAGA complex and provides mechanistic insights into phenotypical observations in SAGA mutants.
Determining the architectures of protein complexes improves our understanding of protein cellular functions. In order to efficiently characterize the subunits of protein complexes assembled in vivo, affinity purification followed by proteomics mass spectrometry (APMS) strategies have been devised. Partial or whole protein complexes are first biochemically isolated using tagged components of the complex, followed by an identification of all co‐purified proteins using mass spectrometry. However, those approaches are insufficient to provide information about the spatial arrangement and the interrelationship of the proteins of the respective complex.
In this study, we developed and applied a novel method utilizing biochemistry, quantitative proteomics and computational approaches in order to characterize the organization of proteins in a complex. The key of our method is the systematic purification of several tagged components of the protein complex in multiple genetic deletion strains, which serve to compromise the integrity of the complex. Using a series of computational methods, these raw quantitative values are next interpreted in order to determine the modular organization of the complex as well as the interrelationships between its subunits, which in turn can be used to predict a macromolecular model of the complex.
We tested this approach to obtain novel insights into the architecture of multi‐protein complexes on the Saccharomyces cerevisiae Spt–Ada–Gcn5 histone acetyltransferase (HAT) (SAGA) and ADA complexes, which are conserved complexes involved in chromatin remodeling (Koutelou et al, 2010). Regular quantitative APMS strategies in wild‐type backgrounds were not sufficient to separate tight protein complexes like SAGA/ADA into its distinct modules. However, after perturbing the system using genetic deletions of several subunits located in different topological parts of SAGA, hierarchical cluster analysis performed on 34 purifications (generated using 10 different TAP‐tagged baits) resulted in a dissociation of the Gcn5 HAT complexes into five modules: (1) the SA_TAF module, (2) the SA_SPT module, (3) the DUB module, (4) the HAT/Core module and (5) the ADA module (Figure 2A and B).
The approach of purifying a protein in a deletion strain furthermore provides valuable information about the influence of the deleted subunit on the association and interdependency of the bait and the remaining preys. In order to quantify these associations, we calculated a probability between every prey and bait in the deletion strain purifications based on Bayes' theorem (Sardiu et al, 2008). In conjunction with preexisting interaction data obtained from yeast two‐hybrid and genetic complementation assays, we finally used these probabilities to predict a low‐resolution model for the architecture of the SAGA and ADA complexes (Figure 4).
This novel approach revealed that the SAGA/ADA complexes are composed of five distinct functional modules, of which two were not previously described (SA_SPT and SA_TAF). These modules, which are responsible for different functions of the SAGA complex, are capable of assembling independently from the remaining modules of the complex. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 HAT complexes. Compared with other structural studies, which mapped 9 of the 19 known SAGA subunits using single EM reconstruction (Wu et al, 2004) or resolved the structure of the 4 subunits of the DUB module using X‐ray crystallography (Kohler et al, 2010; Samara et al, 2010), our approach is not limited to a maximum number of complex subunits. Consequently, we were able to construct a macromolecular model consisting of all 21 SAGA/ADA subunits, which bridges the gap between the previous limited EM analysis and focused X‐ray crystallography analysis.
A combinatorial approach of gene depletions with multiple bait proteins coupled with biochemical, proteomic and computational approaches can experimentally determine modules of stable multi‐protein complexes.
SAGA is a 19‐subunit complex consisting of five connected modules with Spt20 being particularly important for the assembly of the intact complex.
One of the modules, the HAT/Core module, is also shared with the distinct six‐subunit complex ADA.
Architectural models of large multi‐protein complexes can be assembled using our approach, which is an alternative method to generate novel insight into the organization and architecture of multi‐protein complexes.
Many proteins within cells do not function as individual activities, but associate with specific partners to form multisubunit modules with specific functions. These in turn may associate with other functional modules to form a multifunctional macromolecular complex. While the identification of subunits of such complexes can be achieved through a combination of protein purification and proteomics, it is more challenging to ascertain how individual subunits interact and are spatially arranged within these macromolecular complexes. High‐resolution characterization of multi‐protein assemblies using any single experimental or computational method is generally very difficult, especially since traditional methods such as X‐ray crystallography or NMR have certain limitations in characterizing large dynamic protein complexes. However, even if it is not feasible to determine the structure of whole protein complexes at atomic or amino‐acid levels, methods predicting lower‐resolution macromolecular models that accurately position proteins and their connections will accelerate our understanding of protein complexes and their cellular functions. Here, we describe a method capable of determining the architectural organization of multi‐protein complexes. It employs a combination of computational approaches and a systematic collection of quantitative proteomics data from wild‐type and deletion strain purifications. We applied this approach on a data set generated in this study, which aims to gain novel insights into the Saccharomyces cerevisiae Spt–Ada–Gcn5 histone acetyltransferase (HAT) (SAGA) complex.
SAGA is a well‐studied multi‐protein complex involved in regulating histone post‐translational modifications. Originally identified in yeast, the SAGA complex was subsequently shown to be evolutionarily conserved in every organism through humans (Lee and Workman, 2007). Early on, through the use of genetics and conventional biochemistry approaches, SAGA was recognized to be a multi‐protein complex that is made up of smaller functional modules (Figure 1A) (Grant et al, 1997, 1998, 1999; Sterner et al, 1999). The HAT module, which carries out the HAT activity of the SAGA complex, was the first module to be described and its catalytic subunit Gcn5 was shown to harbor limited substrate recognition and specificity (Grant et al, 1999). Subsequently, the Ada2 and Ada3 proteins were shown to also be part of this module (Horiuchi et al, 1997; Saleh et al, 1997; Balasubramanian et al, 2002). Early work already recognized the existence of three distinct Gcn5‐containing complexes that have since been characterized as SAGA, a variant of the SAGA complex, named SLIK/SALSA, and ADA (Grant et al, 1997). All three complexes share the Gcn5/Ada2/Ada3 HAT module. SAGA and SLIK also share all other subunits with the exception of a C‐terminal truncated form of Spt7 and Spt8 (Pray‐Grant et al, 2002; Sterner et al, 2002). On the other hand, only a single unique subunit, Ahc1, was known to exist in the ADA complex (Eberharter et al, 1999) in addition to the HAT module. More recently, a second catalytic module, the deubiquitinylation (DUB) module, was identified within SAGA/SLIK (SALSA), which is important for the DUB of histone H2B (Henry et al, 2003; Daniel et al, 2004). Work from many laboratories has led to the identification of several subunits of this module, that is Ubp8, Sgf11, Sus1 and Sgf73 (Ingvarsdottir et al, 2005; Lee et al, 2005, 2009; Kohler et al, 2006, 2008). In addition, Chd1 was shown to be part of SAGA (Pray‐Grant et al, 2005); however, it was not identified in our purifications.
Due to the complexity of the SAGA/ADA protein complex network, we reasoned that it is an ideal system to test our approach. Furthermore, partial structural information has been established for the SAGA complex, which therefore provides an objective to evaluate our method. Using electron microscopy (EM), Wu et al (2004) determined the first low‐resolution 3D model of the SAGA complex; however, this study only localized 9 of the 19 known subunits of SAGA and the DUB module was not known to be part of SAGA at that time. On the other hand, two recent studies also determined the high‐resolution structure of the four subunits of the DUB module (Kohler et al, 2010; Samara et al, 2010). Since these studies characterized only portions of the SAGA complex, there is no complete model for the architecture of SAGA. Here, we aimed to improve our understanding of the organization of proteins within the complex as well as to identify any components missing from earlier studies.
Using our method, we confirmed all known components of the DUB and HAT modules, and furthermore revealed that the HAT module contains an additional protein, Sgf29, that is present in all Gcn5 complexes. Sgf29 mutants resemble those in Ada2, Ada3 and Gcn5 by displaying classic ADA phenotypes (Berger et al, 1992). We also identified a novel subunit of the ADA complex, which we termed Ahc2. The most intriguing observation revealed through our analysis is that the SAGA complex consists of five distinct modules. In addition to the previously described DUB, HAT/Core and ADA modules, we identified two novel modules, which we termed SA_SPT (i.e. Saga‐associated SuPpressors of Ty) and SA_TAF (i.e. Saga‐associated TATA‐binding protein‐associated factors). Unexpectedly, these modules, which are responsible for the different functions of the SAGA complex, are capable of assembling independently from the remaining modules of the complex.
Data generation for the wild‐type HAT complex
A total of 15 different SAGA subunits and 2 specific ADA components were TAP tagged (hereafter referred to as ‘baits’), expressed and purified by affinity purification (Supplementary Tables S1 and S2). The proteins bound to the respective subunits (i.e. ‘prey’ protein) were analyzed by multidimensional protein identification technology (MudPIT) (Swanson et al, 2009) and quantified using the distributed normalized spectral abundance factors (dNSAF) (Zhang et al, 2010). Since the main focus of our study is on the Gcn5 HAT complexes, we concentrated on the 21 components of the SAGA and/or ADA complexes and used these subunits for further analysis. The remaining proteins identified in the purifications are reported in Supplementary Table S2. To ensure the specificity of the prey subunits (pulled‐down proteins) in each bait, we extracted non‐specific proteins (contaminants) from the data by comparing the dNSAF value in each of the individual purifications with the dNSAF value from a mock control (see Supplementary information). We also ensured the reproducibility of the data set by performing multiple replicates of subunits located in different parts of the SAGA complex (Figure 1B; Supplementary Figure S1; Supplementary Tables S1, S2, S3). Finally, a 29 × 21 matrix was constructed consisting of the dNSAF values for each of the 21 subunits of the complex (Figure 1B).
Since the SAGA complex consists of different functional modules (reviewed in Koutelou et al, 2010), we sought to determine whether a quantitative proteomics data set generated from wild‐type purifications is sufficient to discern the different modules of the SAGA protein complex and to assign proteins of unknown function to the respective modules. One popular method to analyze proteomics data is to hierarchically cluster proteins based on their relative abundance level (Sardiu et al, 2009a). We therefore subjected the 29 × 21 matrix to hierarchical clustering analysis in order to identify groups of proteins that show similar abundance levels (Figure 1B). However, the dendrogram obtained from the hierarchical clustering analysis did not indicate a clear separation of the proteins into different trees, and therefore did not separate the proteins into the different modules. This is a consequence of the fact that all the dNSAF values in the wild‐type network have very similar values, reflecting the stability of the intact complex.
In spite of this, novel observations were nevertheless generated from the wild‐type clustering. First, a previously uncharacterized protein, YCR082W, which we termed Ahc2, was found in close proximity to Ahc1, indicating its association within the complex (Figure 1B). In addition, Ahc2 co‐purified with the components of the ADA complex (Figure 1B). These results suggest that Ahc2 protein is a novel component of the ADA complex. Ahc1 and Ahc2 were only detected when components of the HAT/Core module were used as baits. Furthermore, the baits Ahc1 and Ahc2 only co‐purified components of the ADA complex. Next, a protein of unknown function, Sgf29, had a similar abundance level as known subunits of the HAT/Core module and also co‐purified with the proteins Ahc1 and Ahc2 (Figure 1B), also indicating its association with the ADA complex. However, additional experiments were carried out to support these novel observations.
Quantitative analysis of deletion purifications
The architecture of protein complexes can reveal important principles of cellular organization and function. The separation and the proper identification of local modules within complexes remain an outstanding problem for proteomic analysis and toward this end few methods have been developed (Sardiu et al, 2009b). For example, the use of a single TAP‐tagged protein in different deletion strains followed by mass spectrometry (i.e. proteins dependent on the deleted protein no longer co‐purify with the bait) greatly improved the insights into the modularity and interrelationship of subunits in a protein complex (Mitchell et al, 2008; Sardiu et al, 2009b). However, certain limitations exist with this method. The major constraint is that all the results obtained using a single TAP‐tagged bait and different deletions can only be interpreted relative to the protein that was TAP tagged and only local information proximal to the TAP‐tagged bait can be obtained.
In an effort to overcome this limitation and to comprehensively identify the protein modularity and protein interrelationships within the Gcn5 HAT complexes, we applied a more unbiased comparative approach where individual components of SAGA were deleted and combined with different TAP‐tagged proteins used as baits. The rationale behind the collection of the deleted proteins and the baits was based both on known and driven (i.e. based upon observations made in this study) biology of the SAGA/ADA complexes as follows: For deletion, we selected different subunits from each of the two known functional modules (i.e. DUB and HAT/Core) as well as different subunits from outside of these modules and combined them with different baits for TAP purification (Figure 2A). In addition, since Sgf29 was a protein of unknown function, we included this deletion in the data set. Furthermore, previous studies with limited western blotting showed that the deletion of other genes such as ADA1, SPT7 and SPT20 result in the disruption of the SAGA complex (Sterner et al, 1999). Out of these, the deletion of SPT20 was for us of great interest, since previous work demonstrated that its deletion only yielded moderately increased levels of ubiquitylated H2B (Henry et al, 2003), indicating that the deletion of this single protein compromises the SAGA complex, but to a lower extent than for components of the DUB module, suggesting that it only lead to a partial loss of the complexes functionality (Henry et al, 2003). In order to explain these observations in more detail, a particular focus of our study sought to determine the true effect of the SPT20 deletion on the integrity of the complex, in particular on the HAT and DUB modules, and therefore we included the spt20Δ in our data set.
Regarding the baits, we TAP‐tagged SPT proteins, proteins from the HAT module, proteins from the DUB module and TAF proteins, since strains lacking any of the TAF genes are not viable and therefore cannot be deleted. By purifying these proteins in certain deletion backgrounds, we aimed to capture architectural information from different parts of the SAGA complex. Altogether, we performed a total of 34 purifications that included 10 different TAP‐tagged baits (Spt7, Spt8, Spt20, Ada1, Gcn5, Ada2, Ubp8, Taf5, Taf9 and TAf12) and 10 different deletion strains (gcn5Δ sgf29Δ double mutant, gcn5Δ, sgf29Δ, ada2Δ, sgf73Δ, sgf11Δ, ubp8Δ, spt20Δ, spt3Δ and spt8Δ) (Figure 2A). To ensure the robustness of our results, replicates were also included in our deletion analysis (Figure 2A; Supplementary Figure S1; Supplementary Tables S4, S5, S6). After the respective purifications were conducted and processed, we first applied hierarchical clustering analysis on the entire deletion data set consisting of the 34 purifications (Figure 2A). The results of the clustering analysis indicated a clear dissociation of the SAGA complex and revealed five majors groups/modules: (1) the SA_TAF module, composed of all the SAGA's TAF proteins (Taf6, 5, 12, 9 and 10); (2) the SA_SPT module consisting of all of SAGA's SPT proteins (Spt7, 8, 3 and 20) together with Tra1 and Ada1; (3) the DUB module (Ubp8, Sgf73, Sgf11 and Sus1); (4) the HAT/Core module, which includes all three previous described components (Gcn5, Ada3 and Ada2), together with Sgf29; and (5) the ADA module that consist of the subunits Ahc1 and Ahc2 subunits (Figure 2A). As we already observed in the wild‐type purifications (Figure 1B), even after dissecting the complex by this deletion approach, the proteins Ahc2 and Sgf29 still exhibited similar abundance levels as other members of the ADA and the HAT module, respectively, further indicating that Ahc2 is part of the ADA module and Sgf29 is part of the HAT/core module (Figure 2A). Furthermore, in contrast to the wild‐type purifications in which TAF proteins were separated in different branches in the wild‐type cluster (Figure 1B), in the deletion purifications, all TAF subunits were now tightly grouped together in the dendrogram (Figure 2A). We also analyzed catalytic mutants of Gcn5 and Ubp8 (Figure 2B and C), which showed similar patterns to the deletion of the whole protein, which will be discussed later.
All of our results on the modularity of the SAGA/ADA complexes, together with an itemization of the similarities and discrepancies compared with previous studies, are summarized in Supplementary Table S7. The combination of different baits with several deletion strain backgrounds followed by quantitative mass spectrometric analysis and cluster analysis allowed us to determine the organization of these proteins into modules within the Gcn5 HAT complexes. To further understand the relationship between the proteins within these modules as well as between the modules, we next studied the effect of the deleted subunits on the association between prey and bait proteins within the complex.
Probabilistic deletion network and protein complex organization
The approach of purifying a protein in a deletion strain has the advantage of capturing not only information about the association between every prey protein and the bait but also between the prey protein and the deleted subunit. The bait and the deleted subunit can have similar or different locations in the complex; therefore, this relative position will affect the extent of a deletion on the preys purified by the bait. Furthermore, certain subunits will have a greater effect on the stability of the complex than others. Quantitative proteomics data is a key feature of our method, since it enables us to determine the change in associations between preys and the baits they co‐precipitate with. In order to quantify these associations, we calculated the posterior probability for each prey in a deletion purification based on Bayes' rule as described previously (Sardiu et al, 2008). Bayes' theorem converts the observed spectral counts into discrete levels of association strength (Figure 3; Supplementary Table S8). In principle, in a single deletion purification, those preys that retain a high probability should associate stronger with the bait, while the preys that are present at a low probability or are absent from the purification associate stronger with the subunit that was deleted. The associations between each bait and the purified preys in each deletion strain are represented in Figure 3, in which the colors red, cyan and black correspond to low, medium and high probabilities, respectively (see Supplementary information for details).
With respect to the TAP‐tagged proteins used in the different deletions (Figure 3), as we expected, all the proteins from the same module as the TAP‐tagged protein were highly recovered and had high probabilities. For instance, in Spt7–TAP–gcn5Δ;sgf29Δ, the highest probabilities were observed for Tra1, Ada1 and all the SPTs proteins with Spt8 exhibiting the highest probability (Figure 3A). Interestingly, for Spt8–TAP–sgf29Δ, Spt7 has the highest probability (after Spt8), suggesting a strong association between these two proteins (Figure 3A). To begin, we inspected the HAT/Core module and investigated the effect of the GCN5, SGF29 and ADA2 deletions on this module as well as on the entire complex. In the specific purifications that contain these deletions, ada2Δ had a greater effect on the HAT/Core module when compared with gcn5Δ and sgf29Δ (Figures 2A and 3B). Independent of the TAP‐tagged bait used, all and only the components of the HAT module were lost in ada2Δ (Figure 2A). In contrast, when GCN5 and SGF29 were deleted with any combination of TAP‐tagged proteins, all components of the HAT module remained at low probabilities, except for the deleted subunit (Figure 3B). Also, as expected, for every deletion within the HAT/Core module, proteins of the module itself were most affected (Figures 2A and 3B). In addition, a catalytic mutation of gcn5 (KQL_AAA) shows a similar mild effect as TAP purifications of strains in which the whole Gcn5 protein is deleted (Spt7–TAP–gcn5Δ–sgf29Δ, Spt7–TAP–gcn5Δ; see Figure 2B; Supplementary Table S9). Taken together, these results indicate that Ada2 has a critical role in the formation of the HAT module and its association with the overall complex (Figures 2A and 3).
Next, we considered the SA_TAF module. For the TAP‐tagged TAF baits, the proteins with the highest probabilities in ada2Δ also belonged to the SA_TAF module. These purifications were of particular importance, since the quantitative information obtained from the TAP‐tagged TAF baits could substitute for the absence of the deletions in the TAF proteins, which are lethal, and helped group the TAF proteins into the module. Importantly, this grouping indicated that the histone‐fold TAFs are associated with other TAFs and less likely dimerize with histone‐fold SPT or ADA proteins (Figures 2 and 3C). Since TAF proteins are shared between SAGA and TFIID, their grouping into a discrete module suggests a similar module consisting of the same TAFs which may also exist in TFIID (Figures 2 and 3C), which is a distinct complex that contains additional proteins not observed in SAGA (Auty et al, 2004).
Next, we investigated the stability of the DUB module by monitoring the effect of UBP8, SGF73 and SGF11 deletions on this module as well as on the entire complex. Independent of the TAP‐tagged bait used, ubp8Δ had the same effect on the DUB module, that is Sus1, Sgf11 and Ubp8 were absent from the module, while Sgf73 was still present (Figures 2 and 3D). A catalytic mutant of Ubp8 phenocopied the same effect of ubp8Δ, loss of the Sus1, Sgf11 and Ubp8, while Sgf73 was still co‐purified (Figure 2C; Supplementary Table S9; Ingvarsdottir et al, 2005). These results suggest that a tight connectivity is present between these three proteins and additionally that Sgf73 is the anchor between the DUB module and the rest of the complex. In order to understand to which proteins Sgf73 establishes the contact, thereby attaching the DUB module to the complex, we next investigated the purifications in which SPT20 is deleted (Figures 2 and 3). For all baits from the SA_SPT and SA_TAF modules in spt20Δ, Tra1 and the whole DUB module were absent (Figures 2 and 3A and C). The loss of the DUB module in spt20Δ samples indicates that Sgf73 is interacting with Spt20 in order to bring the DUB module into the complex. Furthermore, these observations also suggest a strong association between Spt20 and Tra1 (Figures 2A and 3).
The deletion of Spt20 is also a prime example to illustrate the principle of our strategy, as the choice of the TAP‐tagged protein dramatically influences the modules recovered in spt20Δ (Supplementary Figure S2). When using baits from the SA_SPT (Ada1) or SA_TAF (Taf9 and 5) modules, the DUB module and Tra1 were absent (Figures 2A and 3B and D). When proteins from the DUB module were used as baits, the rest of the modules were absent except for DUB (e.g. the Ubp8–TAP–spt20Δ, which only yielded the four components of the DUB module alone; Figure 2A). In the case of baits belonging to the HAT module, all other modules were missing with the exception of the HAT/Core module (Figures 2A and 3; Supplementary Figure S3). This observation strongly suggests that even after a protein essential for the proper assembly and function of SAGA is deleted, small sub‐complexes still form. This information could indicate that the assembly of the wild‐type SAGA complex does not occur one protein after the other, but rather that first several modular sub‐complexes form, which successively are joined together in order to form the mature complex.
Based on our results, we next assembled a macromolecular model for the SAGA and ADA complexes and combined it with previously published yeast two‐hybrid and genetic complementation screens (Figure 4; Supplementary Table S10): First, the HAT/Core module contains components that are shared between SAGA and ADA. We placed Ada2 more proximal, since the effect of its deletion on the HAT/Core module was the strongest of all module‐specific mutants analyzed. Conversely, Sgf29 and Gcn5, whose deletions did not reveal interdependency with the rest of the module components, were situated more peripheral. In addition, previous data from a genetic deletion screen showed a negative genetic synergism of Ada2 and Gcn5 with components of the DUB module (Costanzo et al, 2010) (see Supplementary Table S10), thus we positioned these two proteins closer to the DUB module. Since it was reported from yeast two‐hybrid screens (Marcus et al, 1994; Wang et al, 1997; Uetz et al, 2000; Ito et al, 2001; Benecke et al, 2002) that Ada2 directly interacts with Gcn5 and Ada3, we symbolized this direct interaction in the model by a direct contact between Ada2 and these two proteins. Furthermore, we positioned Ada3 in direct contact with Sgf29 based on yeast two‐hybrid data (Ito et al, 2001). Second, we positioned the DUB module close to the SA_SPT module and located Sgf73 close to Spt20; Ubp8, Sgf11 and Sus1 were grouped together as they depend on each other. Third, Tra1 was situated close to Spt20, since the deletion of Spt20 led to the loss of Tra1. Spt3 was located closer to the ADA and DUB modules given that it led to a severe synthetic growth defect with Gcn5 (Lin et al, 2008) and a negative genetic effect with Sgf11 and Sgf73 (Collins et al, 2007; Costanzo et al, 2010). All remaining subunits of the SA_SPT module were added according to the order of their probabilities in the respective purifications. Fourth, for the SA_TAF module, Taf12 was placed more inside the complex, since it exhibited higher probabilities with members of the DUB module when used as a bait compared with Taf5–TAP (see Taf5–TAP and Taf12–TAP in ada2Δ). Yeast two‐hybrid screens (Uetz et al, 2000; Ito et al, 2001; Yatherajam et al, 2003; Yu et al, 2008; Layer et al, 2010) furthermore identified direct interactions between the pairs Taf5–Taf6 and Taf6–Taf9; therefore, we permitted direct contact between these proteins in the model. Finally, for the ADA complex, we added a contact between Ahc1 and Ahc2 based on our deletion results and yeast two‐hybrid screens (Uetz et al, 2000; Ito et al, 2001).
SGF29 is a bona fide ADA family member and a core subunit of Gcn5/HAT complexes
During our proteomic analysis of 12 different wild‐type baits, Sgf29 was found to segregate together with components of the HAT/Core module of the Gcn5 complexes (Figure 1B). Our analysis on various subunit deletions of these complexes strengthened our conclusion that Sgf29 is indeed a member of the HAT/core module that is part of all Gcn5 HAT complexes and not just SAGA (Figure 2A). In contrast to other well‐characterized components of the HAT complexes, Sgf29 is a poorly characterized protein, whose deregulated expression is implicated in malignant transformation (Kurabe et al, 2007). Therefore, we set out to test whether the deletion of SGF29 resulted in similar phenotypes as deletion of GCN5, ADA2 or ADA3. We first analyzed the transcriptional coactivation capacity of the SGF29 deletion strain in order to assay for similarities with ADA gene function (Berger et al, 1992; McMahon et al, 2005). All ADA gene products isolated to date are known to incorporate into the SAGA and SLIK complexes. We assayed for the cells' ability to survive overexpression of Gal4‐VP16, which is toxic to wild‐type cells, but not lethal for deletions in ADA components. Overexpression of VP16 has been suggested to cause misdirection of SAGA to inappropriately activate a number of cellular genes, and to sequester general transcription factors away from productive transcription complexes (Horiuchi et al, 1997). Mutations in SAGA that alter functional interaction with VP16 allow the cells to overcome the toxic growth defect and constitute an ADA phenotype. WT and sgf29Δ yeast strains, along with an ada3Δ strain as a control, were transformed with a high‐copy plasmid containing Gal4‐VP16 (McMahon et al, 2005). Figure 5A shows that the sgf29Δ strain behaved in the same manner as the ada3Δ strain in this assay, suppressing VP16 toxicity (Figure 5A). This finding indicates that Sgf29 is a functional ADA family member, consistent with our observation that it is part of SAGA and SLIK. The suppression of VP16 toxicity in ADA mutants is accompanied by the inability to activate an artificial LacZ reporter gene that is driven by Gal4‐VP16 (McMahon et al, 2005). In agreement with a suppression of VP16 toxicity, the sgf29Δ yeast strain was also deficient in low‐copy Gal4‐VP16‐dependent expression of the LacZ reporter gene (Figure 5B), similar to other ADA family members (McMahon et al, 2005). Overall, our biochemical analysis of Sgf29 revealed that it behaves like a classic ADA gene, as its deletion rescued GAL4‐VP16‐mediated toxicity, while also being required for SAGA‐mediated transcriptional activity (Figure 5A and B).
Deletion of a number of SAGA subunits results in a decreased fitness when yeast are grown on carbon sources other than dextrose. Therefore, we decided to assay whether deletion of SGF29 also compromised growth on various carbon sources. We indeed found that the deletion of SGF29 phenocopied a deletion of SPT7, a SAGA subunit, resulting in a severe growth defect when grown on plates containing only galactose, acetate, ethanol or glycerol as the sole carbon sources (Figure 5C). These phenotypes indicate that the deletion of SGF29 results in an inability to activate the pathways required to use galactose (GAL1), acetate (CIT2) or ethanol/glycerol (ADH1) as the sole carbon source. Taken together, these observations indicate a functional similarity of Sgf29 with other members of the SAGA complex and the ADA gene family.
AHC2 is a novel component of the ADA HAT complex required for the presence of the ADA module
Previous studies have shown that the ADA complex contains Ada2, Ada3, Gcn5 and a unique subunit Ahc1 (Eberharter et al, 1999). However, our analysis revealed that ADA is actually composed of the additional two subunits, Sgf29 (as a member of the HAT/Core) and a previously unidentified polypeptide, YCR082W, which we termed Ahc2 (Figures 1B and 2). Unlike Sgf29, purification of Ahc2 only purified the ADA complex and none of the other components of SAGA or SLIK/SALSA (Figures 1B and 6A and B). In order to confirm our findings that Ahc2 and Sgf29 associate with other ADA complex members, we immunoprecipitated yeast containing a TAP tag on Ahc2 or Sgf29 and probed with an antibody to Ada3, a known component of the HAT/core module (Figure 6A). We found that similar to Ada2–TAP, both Sgf29 and Ahc2 associated with Ada3 (Figure 6A, compare lane 2 with lanes 3 and 4). We next aimed to identify all proteins associated with Ahc2. Purification of the ADA complex using an Ahc2–TAP tag strain followed by MudPIT analysis revealed that Ahc2 only associated with components of the ADA complex (Figures 1B and 6B). Since the Ahc1 deletion was previously shown to not affect the integrity of the rest of the ADA complex, we tested whether the same was true for Ahc2. We performed an Ada2–TAP purification in an AHC2 deletion strain and found that the two specific proteins to the ADA complex, Ahc1 and Ahc2, were lost, while the shared proteins of the ADA complex remained intact (i.e. Gcn5, Ada2, Sgf29 and Ada3) (Figure 2A). This implies that Ahc2 is responsible for tethering Ahc1 into the ADA complex. Since the hallmark of these complexes is their ability to acetylate substrates such as histones, we next tested the ADA complex purified through Ahc2–TAP for HAT activity. To our surprise, we found that the Ahc2‐purified ADA complex strongly preferred to acetylate nucleosomes as opposed to core histones (Figure 6C; Supplementary Figure S6A–C). Although this is in contrast to a previous report (Eberharter et al, 1999), this discrepancy could be explained by the fact that our experiment for the first time purified the ADA complex through a specified ADA subunit prior to the assay ensuring no cross‐contamination of other Gcn5 HAT complexes, such as SAGA and SLIK/SALSA.
In order to comprehend how a multi‐protein complex functions, it is crucial to first understand how the subunits of the complex are organized and assembled. To this end, we employed a combination of biochemistry approaches, quantitative proteomics and computational methods to better understand the architectural organization of the Gcn5 HAT complexes in S. cerevisiae. In a limited previous approach, insights about tight protein complexes were achieved with yeast deletion strains using only a TAP‐tagged bait (Sardiu et al, 2009b). This approach only provided insights into the local architecture of the complex around the TAP‐tagged protein and not the whole complex (Sardiu et al, 2009b). For example, if a certain deletion results in the loss of many proteins from the complex, it cannot be determined if the deletion simply prevented the bait protein from binding to an otherwise intact complex or if the whole complex dissociated. Here, the key for the new methodology was to utilize several TAP‐tagged baits and deletions to clearly define modules and their interconnectivities. As exemplified by Spt20, a protein essential for the function of the SAGA complex, its central role in the assembly of the complex can only be captured by analyzing its deletion in different TAP strains, as distinct modules were purified depending on the component that was chosen for TAP purification (Figure 2A; Supplementary Figure S2). Moreover, our method also permits evaluating lethal components of protein complexes like proteins belonging to the TAF family, for which no deletion analysis can be performed. Through the use of several TAP‐tagged TAF proteins in combination with different deletions from outside the module, we still acquired sufficient information to separate and discriminate the SA_TAF module from the remaining proteins of the complex.
A macromolecular model for the SAGA and ADA protein complexes
The macromolecular model proposed upon the results of our analysis extends earlier studies like a single particle EM reconstruction (Wu et al, 2004), which only localized 9 of the now 19 known subunits, and two recent studies resolving the structure of the DUB module, which contains 4 subunits, using X‐ray crystallography (Kohler et al, 2010; Samara et al, 2010). There is a need for methods that can provide alternative architectural information to bridge this gap in the knowledge of SAGA. Our study, for example, places Ada2, which was not mapped in the EM study, into the center of the HAT/Core module. Similarly, it brings the SA_SPT module in close proximity to the DUB module, and our model predicts that this link is established through Sgf73, which is in striking agreement with the above‐mentioned crystallographic study of Kohler et al (2010). Our model also incorporates the two novel ADA subunits identified in this study, Ahc2 and Sgf29, and its placement is supported both by functional experiments performed in this study and by previous large‐scale yeast studies, which reported interaction for protein pairs Ahc1–Ahc2, Ahc2–Gcn5 and Sgf29–Ada3 (Uetz et al, 2000; Ito et al, 2001; Krogan et al, 2006). Through additional experimentation, we demonstrated that Sgf29 is an ADA family member and a core subunit of the HAT/Core module. In addition, we demonstrated that Ahc2 is a bona fide novel component of the ADA and HAT/Core modules, which can preferentially acetylate nucleosomes over core histones. Since the ADA complex does not contain Tra1 to target it to gene activators, it is intriguing to speculate that the ADA complex may function in a similar fashion with the piccolo NuA4 complex to help maintain overall H3 acetylation in the genome (Selleck et al, 2005; Berndsen et al, 2007).
Contrary to the previous EM‐based view, our model also proposes a modularity of the SAGA and ADA complexes. This modular view, which assigns the different functions of the complexes to distinct modules, is strongly supported by deletions of non‐catalytic units, which affect only some, but not all of the complexes' functions, like the deletion of Spt20, which leaves the DUB function almost intact. This observation suggests that the distinct functional modules of the SAGA complex can persist separately. It is intriguing to speculate that such a modular buildup of different functional units could also be observed in other multi‐protein complexes beyond SAGA and ADA, and could be a common mechanism to utilize the same functional modules in distinct protein complexes. Despite the modularity of SAGA and ADA, the SA_SPT module, which according to our analysis is centrally located in the complex, seems to be necessary for multiple if not all functions of the complexes, as deletions of SPT20, SPT7 and ADA1 were previously shown to disrupt the complexes to an extent, which compromises its multiple functions (Grant et al, 1997; Horiuchi et al, 1997; Roberts and Winston, 1997).
Identification of stable SAGA sub‐complexes
One of the most interesting findings in our analysis revolved around the purification of SAGA from strains lacking SPT20. The deletion of SPT20 is well known to compromise the integrity of the SAGA and SLIK/SALSA complexes (Sterner et al, 1999). However, the exact nature of this disruption had not been addressed until now. We were intrigued by the finding that the deletion of SPT20 leads to only a slight increase in H2B ubiquitination (Henry et al, 2003). If SAGA were disrupted, one would assume that the DUB module would also be compromised. However, the analysis of our proteomic data obtained from purifications through both Ada2 and Ubp8 in the absence of SPT20 revealed that the individual HAT/Core module and the DUB module were intact in the SPT20 deletion (Figure 2A). This finding is consistent with only a partial loss of H2B DUB seen in this deletion (Henry et al, 2003), as the DUB module can probably still carry out a subset of its activity when it is not part of SAGA. Since our deletion analysis of the components of SAGA demonstrated the stability of the modules even after perturbing the complex, it is important to take this into consideration when discussing protein complex integrity. Although SAGA as a whole may be disrupted, there could still be residual activities associated with isolated intact HAT and DUB modules that could lead to spurious acetylation and DUB, which could be detrimental to the cell.
The application of our method to the SAGA and ADA complexes highlights the ability of this approach to generate architectural insights into multi‐protein complexes. It not only provides architectural information, but also facilitates the identification of subunits, which are essential for the integrity of specific modules as well as of the whole complex. Compared with other structural studies, which mapped 9 of the 19 known SAGA subunits using single EM reconstruction (Wu et al, 2004) or resolved the structure of the 4 subunits of the DUB module using X‐ray crystallography (Kohler et al, 2010; Samara et al, 2010), our approach is not limited to a maximum number of complex subunits. Consequently, we were able to construct a macromolecular model consisting of all 21 SAGA/ADA subunits, which bridges the gap between the previous limited EM analysis and focused on X‐ray crystallography analysis. Our analysis also emphasizes the benefit of architectural information for the functional characterization of multi‐protein complexes. Especially in the case of protein complexes composed of multiple functional modules, this information eases the prediction of phenotypic outcomes due to targeted deletions or mutations observed in clinical diseases. Given the enormous challenges in generating high‐resolution structures of multi‐protein complexes with traditional structural biology tools, our method, which can be carried out in any system where gene depletions are possible, provides an alternative approach to generating novel insight into the organization and architecture of multi‐protein complexes.
Materials and methods
S. cerevisiae strains
TAP tag and Mat, a knockout strains, were obtained from Open Biosystems. Gene deletions in the TAP tag strains were carried out by homologous recombination using a kanamycin gene cassette flanked by 200 base pairs of gene‐specific sequence. Strains containing mutants in either UBP8 or GCN5 were constructed as follows: Both wild‐type plasmids were obtained for the MoBY‐ORF collection (Open Biosystems). The plasmids were subsequently subjected to site‐directed mutagenesis using the Quick‐Change mutagenesis kit (Stratagene). The mutated plasmids were sequence verified and then transformed into strains either lacking UBP8 or GCN5. A total of 3 l of the transformed strains were grown in media lacking uracil to maintain the plasmid and subsequent TAP purification was carried out as described earlier.
Identification of proteins by MudPIT
MudPIT analysis of purified complexes was carried out as previously described (Lee et al, 2009). TCA‐precipitated proteins were urea‐denatured, reduced, alkylated and digested with endoproteinase Lys‐C (Roche) followed by modified trypsin (Promega) as described in Florens and Washburn (2006). Peptide mixtures were loaded onto 100 μm fused silica microcapillary columns packed with 5 μm C18 reverse phase (Aqua, Phenomenex), strong cation exchange particles (Partisphere SCX, Whatman) and reverse phase (McDonald et al, 2002). Loaded microcapillary columns were placed in‐line with a Quaternary 1100 series HPLC pump (±Agilent) and an LTQ or XP linear ion trap mass spectrometer equipped with a nano‐LC electrospray ionization source (ThermoFinnigan). Fully automated 10‐step MudPIT runs were carried out on the electrosprayed peptides, as described in Florens and Washburn (2006). Tandem mass (MS/MS) spectra were interpreted using SEQUEST (Eng et al, 1994) against a database of 11 982 amino‐acid sequences, consisting of 5877 S. cerevisiae proteins (non‐redundant entries from NCBI 2007‐03‐04 release), 177 usual contaminants (such as human keratins, IgGs and proteolytic enzymes) and, to estimate false discovery rates (FDR), 5993 randomized sequences for each non‐redundant protein entry. Peptide/spectrum matches were selected and compared using DTASelect/CONTRAST (Tabb et al, 2002) with the following criteria set: spectra/peptide matches were only retained if they had a DeltCn of at least 0.08, and minimum XCorr of 1.8 for singly, 2.5 for doubly and 3.5 for triply charged spectra. In addition, peptides had to be fully tryptic and at least seven amino acids long. Combining all runs, proteins had to be detected by at least two such peptides, or one peptide with two independent spectra. Under these criteria, the FDR is <1% (Supplementary Tables S1 and S4). To estimate relative protein levels, normalized spectral abundance factors (NSAFs) were calculated for each non‐redundant protein, as described in Zybailov et al (2006). Spectral counts for peptides shared between proteins are counted only once, and distributed according to the spectral count contribution of peptides unique to each isoform. NSAF are then calculated based on distributed spectral counts (dSpC) with shared spectral counts distributed among protein isoforms (Zhang et al, 2010). The protein interactions from this publication have been submitted to the IMEx (http://imex.sf.net) consortium through IntAct (pmid: 19850723) and assigned the identifier IM‐15346. The data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: ERr+h3ogpfy2X6FxP4mDtSCfxk8LcZ7HTe7l87ecEnv+cgtpOIxluBlXE/OOFlm/JLXi8k3oAwTSUcb1R1GhzvpIHfYAAAAAAAACTA==. In addition, all RAW files are available from ftp://ftp.stowers‐institute.org/pub/washburn/Lee_SAGA_MSB/.
In order to assay for the classic ADA phenotype, wild‐type and yeast strains deleted for ADA3 and SGF29 were transformed with a high‐copy GAL4‐VP16 plasmid and grown on LEU plates for 3 days at 30°C (McMahon et al, 2005).
β‐Galactosidase assays were performed in WT, sgf29Δ and ada2Δ yeast strains as described in McMahon et al (2005). Yeast strains were transformed with a vector containing a GAL1 promoter element fused to LacZ and a second low‐copy expression vector containing Gal4‐VP16. If SAGA is present, Gal4‐VP16 bound to the GAL1 promoter drives LacZ expression.
TAP purifications were carried out as previously described (Lee et al, 2009), with the exception of the ADA complex used in Figure 6A, which was purified as described in Berger et al (1992). For calmodulin pull‐down experiments, 50 ml of YPD were grown with the TAP‐tagged strains, 1 mg of whole cell extract was added to 25 μl of calmodulin beads and incubated at 4°C overnight. The next day, the beads were washed three times with 300 mM calmodulin‐binding buffer, then 2 × SDS sample was added and the samples were boiled and analyzed by western blotting for Ada3, which also detects the IgG tag in the TAP tag, allowing for simultaneous visualization of both the tag and the interacting protein.
In vitro HAT assay
HeLA core histones and nucleosomes were used to perform the in vitro HAT assay as described previously (Eberharter et al, 1998).
This work was supported by the Stowers Institute for Medical Research and NIH Grant R37GM047867 to JLW.
Author contributions: KKL, MES, MPW, PAG and JLW conceived and designed the research. KKL, MES and MPW wrote the paper. KKL, SKS, JMG and MT developed and carried out the experiments. MES, SKS and LF analyzed the data. MES carried out computational proteomic analyses.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplemental Table 1
All peptides per protein that were identified in our wild‐type analysis. This table contains the protein names, spectra names, scoring information, sequence information, and summary information for all peptides identified in the wild type protein complexes analyses. [msb201140-sup-0001.xls]
Supplemental Table 2
List of Proteins Detected in S. cerevisiae Gcn5 HAT complexes Wild‐Type Prior to Contaminant Extraction. This table contains the locus identification, description, peptide counts, spectral count, shared spectral count, distributed peptides counts, unique peptide counts, sequence coverage, and distributed normalized spectral abundance factor values for all proteins detected in wild type complex purifications prior to contaminant extraction. [msb201140-sup-0002.xls]
Supplemental Table 3
All dNSAF for all proteins detected in S. cerevisiae Gcn5 HAT complexes Wild‐Type following Contaminant Extraction. This table contains the locus identification, description, acronym, open reading frame ID (ORF ID) and dNSAF values for all proteins detected in wild type complex purifications prior to contaminant extraction. [msb201140-sup-0003.xls]
Supplemental Table 4
All peptides per protein that were identified in the deletion analysis. This table contains the protein names, spectra names, scoring information, sequence information, and summary information for all peptides identified in the protein complexes analyses from deletion strains. [msb201140-sup-0004.xls]
Supplemental Table 5
List of Proteins Detected in S. cerevisiae Gcn5 HAT complexes in the Deletion Dataset Prior to Contaminant Extraction. This table contains the locus identification, description, peptide counts, distributed spectral count, sequence coverage, and distributed normalized spectral abundance factor values for all proteins detected in protein complexes analyzed from deletion strains prior to contaminant extraction. [msb201140-sup-0005.xls]
Supplemental Table 6
All dNSAF values for all proteins detected in S. cerevisiae Gcn5 HAT complexes in the deletion dataset following contaminant extraction. This table contains the locus identification, description, acronym, open reading frame ID (ORF ID) and dNSAF values for all proteins detected in protein complexes analyzed from deletion strains prior to contaminant extraction. [msb201140-sup-0006.xls]
Supplemental Table 7
The location of the different subunits of the SAGA/ADA complexes within the distinct modules. This table contains the module defined in previous studies, the subunits assigned in previous studies, the literature support of this, the module definition in our study, and the subunits assigned in our study in order to compare previous results with our refined complex architecture. [msb201140-sup-0007.xls]
Supplemental Table 8
List of probabilities between each bait and prey in all complexes. This table contains in (A) the list of probabilities between each bait and prey in wild‐type Gcn5 HAT complexes and (B) the list of probabilities between each bait and prey in a deletion Gcn5 HAT complexes. [msb201140-sup-0008.xls]
Supplemental Table 9
Interaction data obtained from the SAGA catalytic mutants. This table contains the T locus identification, description, peptide counts, spectral count, shared spectral count, distributed peptides counts, unique peptide counts, sequence coverage, and distributed normalized spectral abundance factor values for all proteins detected in catalytic mutant strain analyses. [msb201140-sup-0009.xls]
Supplementary methods, results, Figures 1–6, and Table 10. [msb201140-sup-0010.pdf]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2011 EMBO and Macmillan Publishers Limited