The fission yeast Schizosaccharomyces pombe is a widely used model organism to study basic mechanisms of eukaryotic biology, but unlike other model organisms, its proteome remains largely uncharacterized. Using a shotgun proteomics approach based on multidimensional prefractionation and tandem mass spectrometry, we have detected ∼30% of the theoretical fission yeast proteome. Applying statistical modelling to normalize spectral counts to the number of predicted tryptic peptides, we have performed label‐free quantification of 1465 proteins. The fission yeast protein data showed considerable correlations with mRNA levels and with the abundance of orthologous proteins in budding yeast. Functional pathway analysis indicated that the mRNA–protein correlation is strong for proteins involved in signalling and metabolic processes, but increasingly discordant for components of protein complexes, which clustered in groups with similar mRNA–protein ratios. Self‐organizing map clustering of large‐scale protein and mRNA data from fission and budding yeast revealed coordinate but not always concordant expression of components of functional pathways and protein complexes. This finding reaffirms at the protein level the considerable divergence in gene expression patterns of the two model organisms that was noticed in previous transcriptomic studies.
The unicellular archiascomycete fungus Schizosaccharomyces pombe is a well‐established model organism, but only ∼1500 of its predicted ∼4900 genes and proteins have been experimentally characterized. Weighing the advantages and disadvantages of currently available methods for quantitative proteomics, we have embarked on a mass spectrometry‐based approach for relative quantitation of native, unmodified fission yeast proteins. In addition, we have compared mRNA and protein expression profiles in fission yeast and budding yeast to assess the overall protein–mRNA correlation in these related organisms.
We devised an extensive multidimensional biochemical prefractionation scheme of total cell lysate from wild‐type fission yeast cells, followed by analysis of individual fractions by liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC ESI MS/MS). Approximately 3 million mass spectra were matched to 12 413 non‐redundant peptides, resulting in the identification of 1465 proteins (∼29.5% of the predicted fission yeast proteome). The list of proteins was representative of the whole proteome across the entire range of molecular weights, isoelectric points and gene ontology (GO) attributes. More detailed analysis revealed equal identification rates for essential and non‐essential proteins (both 36%). Similarly, yeast‐specific proteins were represented at the same rate as the entire proteome (30%). Metazoan ‘core’ proteins were overrepresented (47%), whereas we undersampled proteins containing predicted transmembrane domains (14%) and S. pombe‐specific proteins (10%).
To quantitatively rank the identified proteins relative to each other we used spectral counts (Liu et al, 2004; Kislinger et al, 2006). A negative binomial regression model was developed to adjust spectral counts to the number of predicted tryptic peptides allowing one miscleavage. Based on adjusted spectral counts (ASCs), we assembled an abundance ranked list of all 1465 proteins identified, which was validated by comparing it to absolute quantitation data established for a series of cytokinesis‐related fission yeast proteins (Wu and Pollard, 2005). Plotting our ASC data versus the absolute quantitation data revealed a close correlation (rP=0.98), suggesting that ASCs provide a good approximation of relative protein abundance.
The range of ASCs spanned more than three orders of magnitude. The mean ASC was 68.0, whereas the median was 14.6, indicating that the vast majority of the 1465 proteins identified are of relatively low abundance. The median of metazoan core proteins (ASC=24.2) is significantly higher than that of all proteins detected, whereas the abundance of S. pombe‐specific proteins is considerably lower (ASC=5.5). In addition, essential proteins are considerably more abundant (median ASC=12.6) than non‐essential proteins (ASC=7.5). This finding can be rationalized by the enrichment of highly expressed core proteins in the set of essential proteins (Supplementary Data File 2). Analysis of 10 protein complexes for which we identified greater than 80% of their known or predicted subunits indicated that the protein synthesis machinery (ribosome, eIFs) and also the protein folding and degradation machinery (CCT chaperonin, proteasome) are among the most abundant molecular modules in fission yeast (Figure 2D).
We also determined the overall correlation of our protein data set with mRNA abundance as estimated by cDNA microarray analysis. The comparison of 1367 protein–mRNA pairs revealed a Pearson correlation coefficient (rP) of 0.58 (Figure 3A), indicating a substantial correlation between mRNA and protein abundance in fission yeast similar to what was found for budding yeast (Ghaemmaghami et al, 2003; Greenbaum et al, 2003). We also calculated correlation coefficients for specific functional pathways, protein families and multisubunit protein complexes. Whereas high coefficients were obtained for signalling and metabolic pathways (Figure 3B), the majority of multisubunit protein complexes showed very low or even negative correlation coefficients (Figure 3B). The poor protein–mRNA correlation for complexes would be expected, if their subunits were coordinately regulated. Coordinate regulation may lead to clustering of protein–mRNA ratios for complex components around a similar value, thus precluding a strong correlation. Indeed, we noticed more consistent protein–mRNA ratios for individual complex subunits than observed for all proteins. Thus, while the protein–mRNA correlations were low for multisubunit protein complexes, clustering of their protein–mRNA ratios around similar values indicated coordinate regulation of complex subunits (Table I). Although this regulation could principally occur at any level, the low protein–mRNA correlation suggests a substantial contribution of post‐transcriptional mechanisms (Greenbaum et al, 2003).
The reverse scenario, clustering of protein–mRNA ratios around similar values, but relatively high protein–mRNA correlation, was observed for the stress response pathway as well as for glycolysis and amino acid biosynthesis (Figure 3B and C). This pattern might reflect the fact that proteins involved in hierarchical signal transduction cascades or metabolic pathways do not necessarily cooperate in stoichiometric amounts. Most other pathways and protein families showed no clustering. Among those were entities with both low (transporters, Figure 3B) and high (kinases; Figure 3B) protein–mRNA correlation. For these remaining cases, high protein–mRNA correlations would suggest control primarily at the transcriptional level, whereas low correlations would indicate extensive posttranscriptional control (Table I) (Greenbaum et al, 2003).
The generation of quantitative fission yeast protein and mRNA data sets and the availability of corresponding data sets for budding yeast enabled the first large‐scale comparison of mRNA and protein levels of two eukaryotic organisms. Self‐organizing map clustering SOM revealed many similarities in the mRNA and protein abundance patterns in the two yeasts, but also marked differences. Many SOM clusters were significantly enriched for non‐redundant GO attributes (P⩽0.0005). This finding suggests that many pathways and complex subunits are coordinately, albeit not necessarily concordantly regulated in both fission and budding yeasts. For example, 6/13 components of the microtubule cytoskeleton organization GO category present in our data sets were coordinately and concordantly regulated in both yeasts (cluster 10; Figure 5B). In contrast, ATPases and entities involved in chromatin remodelling and intracellular transport were coordinately, but discordantly regulated with mRNA levels being low in budding yeast (cluster 14, Figure 5C–E). Our comparison reinvigorates the conclusion gained from previous functional genomics studies that similarities in the control of gene expression in the two yeasts are less pronounced than expected from genome comparisons (Mata et al, 2002; Oliva et al, 2005; Rustici et al, 2004).
Shotgun proteomics employing multidimensional prefractionation and tandem mass spectrometry, aided by mathematical modelling of spectral count information, enabled a label‐free relative quantitation of ∼30% of the theoretical fission yeast proteome.
Whereas there was an overall positive correlation between protein and mRNA abundance of 0.58, the correlation varied widely for specific subgroups of proteins with protein complexes showing low correlation but apparentlycoordinate control of their subunits.
The first large‐scale comparison of mRNA and protein abundance in two related eukaryotic model organisms, fission and budding yeast, indicated frequently coordinate, but rarely concordant regulation.
Schizosaccharomyces pombe is a unicellular archiascomycete fungus displaying many properties of more complex eukaryotes. It has been estimated that fission yeast diverged from budding yeast ∼1100 million years ago (Heckman et al, 2001), thus accounting for their considerable divergence in genome organization (Wood et al, 2002). Despite differences in the number of genes, the number of introns, and centromere size, basic cellular processes are highly conserved between the two yeasts with ∼3600 proteins being predicted or confirmed orthologs (Wood, 2006). However, it is still unclear to what extent mechanisms of gene expression in fission yeast overlap with those in budding yeast.
S. pombe is a well‐established model organism for the study of cell‐cycle regulation, cytokinesis, DNA repair and recombination, and checkpoint pathways, but only ∼1500 of its predicted ∼4900 genes and proteins have been experimentally characterized. Although mRNA profiling has begun to address functional aspects of the fission yeast genome (Mata et al, 2002; Chen et al, 2003; Rustici et al, 2004; Oliva et al, 2005; Peng et al, 2005; Marguerat et al, 2006), the notion was expressed that mRNA levels are only a partial reflection of the functional state of an organism (Greenbaum et al, 2003). It is widely accepted that a comprehensive understanding of the genomic information will require, besides other strategies, means of analyzing quantitative differences in protein expression on a proteome‐wide scale (Anderson et al, 2000; Bakhtiar and Tse, 2000; Yates, 2000).
Several quantitative methods, including ICAT (Gygi et al, 1999), iTRAQ (Ross et al, 2004), stable isotope labelling (Ong et al, 2002; Washburn et al, 2002), AQUA (Gerber et al, 2003), spectral sampling (Liu et al, 2004; Kislinger et al, 2006), protein abundance indexing (Ishihama et al, 2005), and whole‐genome ORF epitope tagging (Ghaemmaghami et al, 2003; Matsuyama et al, 2006), have been employed for proteomic analyses of model organisms, in particular budding yeast. All of these techniques have their intrinsic strengths and limitations, including the bias of mass spectrometry‐based methods toward proteins of medium to high abundance, and the potential for interference of epitope tags with endogenous protein function, expression, and localization. In addition, epitope tagging can only interrogate putative known ORFs and is only applicable to organisms that are readily amenable to genetic manipulation in a high‐throughput format. Mass spectrometry, on the contrary, can potentially identify new proteins and is broadly applicable to any proteome for which a corresponding genome sequence is available.
Weighing the advantages and disadvantages of currently available methods, we have embarked on a mass spectrometry‐based approach for relative quantitation of unmodified fission yeast proteins. In addition, we have compared mRNA and protein expression profiles in fission yeast and budding yeast to assess the overall protein–mRNA correlation in these related organisms.
Results and discussion
Analysis of the S. pombe proteome by multidimensional prefractionation and LC ESI MS/MS
We devised the extensive multidimensional biochemical prefractionation scheme outlined in Figure 1A, starting with total cell lysate from wild‐type fission yeast cells growing vegetatively in mid‐log phase in rich media. Aliquots of the lysate were fractionated by preparative isoelectric focusing (IEF) on immobilized pH gradients, or in two different liquid‐phase formats, by one‐dimensional (1D) gel electrophoresis, and by strong ion‐exchange chromatography in a spin column format (Doud et al, 2004), followed by analysis of individual fractions by 1D liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC ESI MS/MS). In parallel, total fission yeast lysate was subjected to on‐line 2D LC ESI MS/MS (=‘MudPIT’; Washburn et al, 2001), upon in‐solution digestion into tryptic peptides.
Altogether, ∼3 million mass spectra were collected and rigorously searched against the fission yeast protein database using the SEQUEST algorithm (Eng et al, 1994). Mass spectra were matched to 12 413 nonredundant peptides (Supplementary Data File 1), resulting in the identification of 1465 proteins (Supplementary Data File 2) with a predicted false‐positive peptide identification rate of 1.05%, as determined by searching against a combined forward and reverse protein database (Peng et al, 2003a). The identified proteins cover ∼29.5% of the predicted fission yeast proteome. To our knowledge, this represents the highest percent coverage of native, unmodified proteins reported to date for any eukaryotic proteome. We also confirmed 40 predicted sequence orphans as well as five hypothetical proteins, and identified three new proteins, which were listed as dubious ORFs (SPAC13G6.13, SPBC354.04) or pseudogenes (SPBC16E9.16c) in the S. pombe genome database.
Although the individual prefractionation techniques contributed to the total protein count to different extents (Figure 1B), the extensive scale of the combined approaches identified a list of proteins that was representative of the whole proteome across the entire range of molecular weights and isoelectric points (Figure 1C and D). Most major Gene Ontology (GO) attributes for S. pombe were represented, indicating that our study broadly sampled across cell functions (Supplementary Data File 1). For example, we identified 132 of 141 ribosomal proteins and all subunits of the 26S proteasome and the CCT chaperonin complex. We also identified all enzymes of the cysteine, glutamate, glycine, isoleucine, leucine, proline, threonine, valine, aspartate, adenine and aromatic amino‐acid biosynthesis pathways as well as 45 kinases (23% of all kinases predicted from the genome sequence), 20 predicted transcriptional regulators (14%), and 21 mitochondrial proteins (15%).
More detailed analysis revealed equal identification rates for essential and non‐essential proteins (both 36%; Figure 1E) based on 187 proteins present in our data set for which information on essentiality is available in fission yeast (83 essential, 104 nonessential genes/proteins). Similarly, yeast‐specific proteins were represented at the same rate as the entire proteome (30%; Figure 1E). Metazoan ‘core’ proteins (proteins common to S. pombe, Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster; see Supplementary information), were overrepresented (47%; Figure 1E), a finding that is consistent with their higher mRNA levels (Mata and Bahler, 2003). In contrast, we undersampled proteins containing predicted transmembrane domains (14%) and S. pombe‐specific proteins (10%; Figure 1E). Although not all membrane proteins may be equally amenable to extraction under our sample preparation conditions, the underrepresentation of S. pombe‐specific proteins is mostly due to their specialized functions in the sexual differentiation pathway (data not shown), which cannot be effectively sampled in the vegetatively growing cells used here.
Label‐free relative quantitation of S. pombe proteins
To quantitatively rank the identified proteins relative to each other, we used spectral counts. Spectral counts represent the number of nonredundant mass spectra identifying the same protein. Whereas spectral counts are predicted to increase linearly with protein abundance (Liu et al, 2004), this relationship is amended by protein size, with larger proteins having a statistically higher probability of being detected. The relationship is further modified by the sequence‐dependent number of peptides produced by the tryptic cleavage. Finally, an allowance for up to three enzymatic miscleavages is often granted during the SEQUEST database search, thus further distorting the theoretical linear relationship between spectral count and protein abundance.
To apply an appropriate adjustment of spectral counts to a measure of protein size, we compared goodness‐of‐fit statistics applied to negative binomial regression models to determine which of the above parameters (number of amino acids, number of tryptic peptides, miscleavages) figured most prominently. The models revealed that adjustment to the number of tryptic peptides with one miscleavage resulted in the most optimal fit statistics for the experimental LC ESI MS/MS data (Supplementary Figure 4 and Supplementary Table 1).
Based on adjusted spectral counts (ASCs), we assembled a ranked list of all 1465 proteins identified (Supplementary Data File 2). This quantitative ranking reflects the abundance of each protein relative to all others and their quantitative distances. The ranked list was validated by comparing it to absolute quantitation data established for a series of 27 cytokinesis‐related fission yeast proteins (Wu and Pollard, 2005). While these absolute measurements rely on epitope tagging, the tagged alleles were extensively validated for functionality under various conditions and in various genetic backgrounds, thus suggesting that tagging did not interfere with normal protein expression (Wu and Pollard, 2005). Of the 27 cytokinesis proteins, 10 were represented on our list. Plotting our ASC data versus the absolute quantitation data revealed a close correlation (rP=0.98; Figure 2A), suggesting that ASCs provide a good approximation of relative protein abundance.
The range of ASCs spanned more than three orders of magnitude (Figure 2B). The mean ASC was 68.0, whereas the median was 14.6, indicating that the vast majority of the 1465 proteins identified are of relatively low abundance compared to a small number of hyperabundant proteins (Figure 2B). The group of the 30 most abundant proteins (ASC between 584 and 4269) contained proteins of which all but three have orthologs in budding yeast that were also detected by whole‐genome TAP tagging (Ghaemmaghami et al, 2003). This group includes eight glycolytic enzymes, six enzymes involved in biosynthetic pathways, seven translation factors, five heat‐shock proteins, as well as two thioredoxin peroxidases (Supplementary Data File 2). The most abundant fission yeast protein is Eno101, a subunit of the phosphopyruvate hydratase complex (ASC=4269), followed by phosphoglycerate kinase (Pgk1, ASC=2301) as a distant second.
The group of the 30 least abundant proteins detected (ASC=0.93–0.95) contains a variety of enzymes involved in RNA metabolism (two helicases, Argonaute 1, two RNA‐binding proteins) and ubiquitin‐mediated proteolysis, two SH3 domain proteins, three kinases, as well as eight proteins of unknown function (Supplementary Data File 2). Notably, 10 out of these 30 proteins do not have orthologs in budding yeast. In addition, seven out of those 20 that do have orthologs did not give signals in the TAP‐tagging approach (Ghaemmaghami et al, 2003).
Our quantitative data also indicated that the median abundance of metazoan core proteins (ASC=24.2) is significantly higher than that of all proteins detected (ASC=14.6, P<0.05), whereas the abundance of S. pombe‐specific proteins is considerably lower (ASC=5.5; Figure 2C). This finding is consistent with the higher representation of core proteins in our data set (Figure 1E) and with their higher mRNA levels as reported previously (Mata and Bahler, 2003). In addition, essential proteins are considerably more abundant (median ASC=12.6) than non‐essential proteins (ASC=7.5). This finding can be rationalized by the enrichment of highly expressed core proteins in the set of essential proteins (Supplementary Data File 2).
Analysis of 10 protein complexes for which we identified greater than 80% of their known or predicted subunits, and which are involved in a large variety of cellular processes, indicated that the translation initiation factor eIF4 is the most abundant protein complex in S. pombe (median ASC=85.7; Figure 2D). eIF4 is similar in abundance to the ribosome (ASC=70.7), but three‐ to four‐fold more abundant than eIF2 (ASC=21.7) and eIF3 (ASC=32.0), two other translation initiation factor complexes (Figure 2D). Although, during the process of translation initiation, all of these eIFs are known to join a stoichiometric 43S initiation complex, it is thought that eIF2 and eIF3, but not eIF4, dissociate from the mRNA upon successful scanning for the initiator AUG codon (Gebauer and Hentze, 2004). Our finding that eIF2 and eIF3 are considerably less abundant than eIF4 and the ribosome therefore, underpins the concept that the former eIFs are only transiently involved during the initiation reaction, whereas the cap‐binding eIF4 complex and the ribosome stay on the mRNA during translation. Our data also indicate that the protein synthesis machinery (ribosome, eIFs) and the protein folding and degradation machinery (CCT chaperonin, proteasome) are among the most abundant molecular modules in fission yeast and perhaps other eukaryotes (Figure 2D).
Comparison of S. pombe proteome data with S. cerevisiae
We compared the abundance ranked list of S. pombe proteins with similar lists of S. cerevisiae proteins. This was carried out for the subset of proteins that have known or predicted orthologs in budding yeast (1285 of 1465 proteins based on ortholog mapping information in S. pombe GeneDB (www.genedb.org/genedb/pombe/index.jsp). Two data sets of S. cerevisiae proteins were used. The first was derived from published 2D LC ESI MS/MS data (Liu et al, 2004) that we subjected to our adjustment of spectral counts to the number of tryptic peptides (=Cerevisiae‐ASC data set). This set contained 473 pairs of orthologous proteins that were detected in both studies. The second list was assembled from the absolute quantitation data derived from whole‐genome ORF tagging with the TAP epitope (Cerevisiae‐TAP data set; Ghaemmaghami et al, 2003). This data set contained 1033 orthologs, 252 fewer than the theoretically possible 1285, because ∼20% of the native fission yeast proteins we detected by 2D LC ESI MS/MS could not be quantified when TAP tagged in budding yeast (Ghaemmaghami et al, 2003). For example, our data set contained all 32 subunits of the 26S proteasome (Finley et al, 1998; Supplementary Data File 4), whereas only 25 of these subunits were detected in the ORF tagging approach (Ghaemmaghami et al, 2003). Similarly, we identified 94% of all cytosolic ribosomal subunits, whereas only 76% were identified by TAP tagging in budding yeast (Supplementary Data File 4).
Both budding yeast data sets correlated with the fission yeast protein list as indicated by Pearson correlation coefficients of 0.56 and 0.45 and Spearman rank correlation coefficients rs of 0.55 and 0.42, respectively (Figure 2E; Supplementary Figure 1). Notably, our data showed an overall stronger correlation with the budding yeast 2D LC ESI MS/MS data presented by Liu et al (2004) (=Cerevisiae‐ASC). This finding reinforces previously expressed notions regarding the limitations of comparing mass spectrometry‐based proteomics data to absolute quantitation based on ORF tagging (Liu et al, 2004). However, organism‐specific differences in protein expression are also expected to distort the correlation (see Figure 5).
Nonetheless, our LC ESI MS/MS data showed a remarkable overlap with the Cerevisiae‐TAP data set in the relative frequency distribution of the detected proteins across the entire dynamic range (Supplementary Figure 2). For example, 88% of the 1033 budding yeast proteins, for which we have identified the fission yeast orthologs, are present at under 50 000 molecules/cell, 62% are under 10 000 molecules/cell, and 11% are under 1000 molecules/cell. This finding suggests that the dynamic range of multidimensional prefractionation and LC ESI MS/MS analysis is not necessarily inferior to that of the wholeORF tagging approach.
Correlation of protein and mRNA levels in fission yeast
We next determined the overall correlation of our protein data set with mRNA abundance as estimated by cDNA microarray analysis. Total RNA was prepared from the same S. pombe strain maintained under identical growth conditions as used for the proteomic analyses, followed by hybridization onto S. pombe cDNA microarrays (Oliva et al, 2005; Zhou et al, 2005). Background subtracted hybridization values averaged from three parallel experiments (see Supplementary Data File 2) were used to estimate mRNA abundance. Although it is clear that the hybridization values obtained on cDNA microarrays are influenced by factors other than mRNA abundance (probe length, GC content, etc.), these variations are relatively minor with probes longer than 500 bp as used here (Lyne et al, 2003). Similarly, Mata and Bahler (2003) have previously used absolute hybridization signals as approximate measures of mRNA levels in fission yeast.
The comparison of 1367 protein–mRNA pairs for which data were obtained (Supplementary Data File 2) revealed a Spearman rank correlation coefficient (rS) of 0.61 and a Pearson correlation coefficient (rP) of 0.58 (Figure 3A), indicating a substantial correlation between mRNA and protein abundance in fission yeast. The extent of correlation is very similar in budding yeast as determined with the whole‐genome TAP tagging data set (rS=0.57, Ghaemmaghami et al, 2003), and by an independent re‐evaluation of additional large‐scale budding yeast data sets (rP=0.66; Greenbaum et al, 2003).
The mean mRNA intensity of proteins detected in our multidimensional analysis was 2462, whereas for undetected proteins the number was 420 (Supplementary Figure 3). This comparison confirmed the expectation that mass spectrometry‐based proteomics has a bias towards detecting proteins encoded by highly expressed mRNAs. However, a significant portion of low‐abundance mRNAs may encode proteins that never accumulate in the vegetative state. Consistent with this notion is the demonstration that 1033 genes are induced more than four‐fold during nitrogen starvation and meiosis (Mata et al, 2002). The actual vegetative translatome may therefore be devoid of many of the proteins encoded by such developmentally regulated mRNAs. These proteins may also escape detection by ORF tagging and immunoblotting, thus explaining why the dynamic range of our LC ESI MS/MS analysis was comparable to the whole‐genome ORF tagging approach employed in budding yeast (Ghaemmaghami et al, 2003; Supplementary Figure 2).
Functional pathway analysis
Although the overall protein–mRNA correlation is surprisingly high, we wondered whether this correlation is maintained throughout specific functional pathways, protein families, and multisubunit protein complexes. We calculated the Pearson correlation coefficients for several subclasses of protein–mRNA pairs that were highly represented in our data set (see Supplementary Data Files 3 and 4 for individual proteins). Whereas a high coefficient was obtained for kinases (rP=0.80; Figure 3B), the correlation was weak for transporters (rP=0.21) and the unfolded protein response (UPR) pathway (rP=0.12), and moderately strong for glycolytic enzymes (rP=0.36) and transcription factors (rP=0.42). Correlations similar to those observed for all proteins (rP=0.58) were found for the categories amino‐acid biosynthesis (rP=0.63), signal transduction (rP=0.61), protein translation (rP=0.5), stress response (rP=0.58), and cell‐cycle regulation (rP=0.67; Figure 3B).
For the majority of multisubunit protein complexes, very low or even negative correlation coefficients were obtained (Figure 3B). Previous bioinformatics studies have suggested that a high protein–mRNA correlation (i.e. the higher the mRNA, the higher the protein) as observed here for kinases and cell‐cycle components reflects control of protein abundance primarily at the level of mRNA synthesis, whereas poor correlation is indicative of post‐transcriptional control (Greenbaum et al, 2003). By extension, negative correlations indicate extensive control at the post‐transcriptional level (i.e. the higher the mRNA, the lower the protein and vice versa). The subunits of presumed stoichiometric protein complexes such as the 80S ribosome, the 26S proteasome, and the CCT complex would therefore be controlled substantially at the post‐transcriptional level.
The poor protein–mRNA correlation for complexes would be expected, if their subunits were coordinately regulated. For example, if all subunits of a protein complex had exactly equal protein and mRNA levels, say 5.0 units and 1.0 unit, respectively, then all data points would coincide at the very same coordinates of a protein versus mRNA plot (x=5; y=1; protein–mRNA ratio=5). Consequently, the protein–mRNA correlation would be zero for the subunits of this protein complex.
Indeed, we noticed that the protein and mRNA data points for many protein complexes were not randomly scattered over the entire data map, but tended to cluster together. To comprehensively illustrate this, we determined the protein–mRNA ratio individually for every protein in a given pathway, family, or complex, and compared it to the entire data set. Individual ratios of functional pathway components were used to determine their location and relative distance on the ratio distribution curve of the entire data set of 1367 protein–mRNA pairs. This reference curve indicates the extent and orientation of the deviations of all observed ratios from the median ratio, which was arbitrarily set to 1.0. The partitioning of pathway components along this curve thus informs about the degree to which they cluster around certain protein–mRNA ratios and their distances from the median. The graphical representation of clustering effects was enhanced by displaying the data points for specific pathway components at equal distance laid over the reference curve, thus causing informative phase shifts of the curves.
This analysis revealed strong deviations from the reference curve for several protein complexes, suggesting more consistent protein–mRNA ratios for individual subunits than observed for all proteins. Ribosomal subunits clustered with relatively higher levels of mRNA than protein (Figure 3C; Supplementary Data File 5), whereas the shape of the ratio distribution curve for eIF3, the COP1 complex, and several other protein complexes (Supplementary Data File 5) indicated clustering around the median ratio (Figure 3C). This differential distribution was even more pronounced for the eight subunits of the CCT complex (Figure 3C). In other words, all eight subunits of the CCT complex displayed highly similar protein–mRNA ratios, and therefore appear to be coordinately regulated at the mRNA and protein levels. Thus, although the protein–mRNA correlations were low for multisubunit protein complexes, clustering of their protein–mRNA ratios around similar values indicated coordinate regulation of complex subunits (Table I). Although this regulation could principally occur at any level, the low protein–mRNA correlation suggests a substantial contribution of post‐transcriptional mechanisms (Greenbaum et al, 2003). Notably, the UPR pathway showed a similar pattern in correlation and ratio distribution (Figure 3B and C), perhaps suggesting that components of this pathway are also present in stoichiometric amounts.
The reverse scenario, clustering of protein–mRNA ratios around similar values, but relatively high protein–mRNA correlation, was observed for the stress response pathway as well as for glycolysis and amino‐acid biosynthesis (Figure 3B and C). This pattern indicated that protein and mRNA expression varied widely among the members of these groups (Table I). This might reflect the fact that proteins involved in hierarchical signal transduction cascades or linear and circular metabolic pathways do not necessarily cooperate in stoichiometric amounts. Rather signal amplification and the specific activities of metabolic enzymes may govern the varying levels of protein required for these functions.
Most other pathways and protein families showed a considerable overlap of protein–mRNA ratios with the reference curve, indicating no clustering. Among those were entities with low (transporters; Figure 3B) and high (kinases; Figure 3B) protein–mRNA correlation. For these remaining cases, high protein–mRNA correlations would suggest control primarily at the transcriptional level, whereas low correlations would indicate extensive post‐transcriptional control (Table I) (Greenbaum et al, 2003).
Protein and mRNA relationship as a correlate of post‐translational modifications
Although no specific enrichment strategies were employed, rigorous interrogation of our peptide data sets obtained by mass spectrometry provided high confidence indications for post‐translational modification (PTM) of 53 peptides, which were matched to 51 proteins. A total of 40 proteins contained at least one peptide that was phosphorylated on either serine, threonine, or tyrosine (Table II). The set of phosphoproteins was enriched for protein kinases (15% versus 1.6% in the entire proteome), a finding that is consistent with the known propensity of these enzymes to autophosphorylate and/or be part of kinase cascades. The budding yeast orthologs of eight of these proteins were previously shown to be phosphorylated by methods other than mass spectrometry. In one case, acetyl coenzyme‐A carboxylase, the serine phosphorylation site we mapped in fission yeast exactly corresponds to the same position where the budding yeast protein was found to be modified (Ficarro et al, 2002).
For another set of 11 proteins, we mapped the precise sites of modification with the diglycine moieties created upon trypsin digestion of ubiquitylated lysines (Table II). Independent evidence for ubiquitylation of the budding yeast ortholog of one of these proteins was provided previously (Peng et al, 2003b). Five proteins contained both phosphorylated and ubiquitylated peptides (Table II), a finding that is consistent with the well‐established connection between phosphorylation and ubiquitylation (Karin and Ben‐Neriah, 2000).
The median abundance of phosphorylated (ASC=3.58) and ubiquitylated (ASC=2.84) proteins was considerably lower than the abundance of all 1465 proteins in the data set (ASC=14.6; Figure 4A). Ubiquitylated proteins also showed a stark dissociation of median mRNA levels, which were relatively high (633 versus 757 in the entire data set), from protein levels, which were very low (2.84 versus 14.6; Figure 4A). This finding indicates that extensive proteolytic control of these proteins through the ubiquitin–proteasome pathway may be dominant over their relatively high mRNA expression levels. This conclusion was further strengthened by comparing individual protein–mRNA ratios of ubiquitylated proteins to median adjusted ratios for the entire data set. This analysis revealed clustering of ubiquitylated proteins with relatively higher mRNA than protein levels, whereas phosphoproteins showed a distribution largely congruent with the reference curve (Figure 4B).
Steady‐state proteome and transcriptome comparison of S. pombe and S. cerevisiae
The generation of quantitative fission yeast protein and mRNA data sets and the availability of corresponding data sets for budding yeast enabled the first large‐scale comparison of mRNA and protein levels of two eukaryotic organisms. For this, we used the Cerevisiae‐MS data (Liu et al, 2004) with adjustment of spectral counts to the number of tryptic peptides and published mRNA data derived from cDNA microarray analysis of wild‐type S. cerevisiae grown under conditions comparable to those of our fission yeast strains (Gasch et al, 2001). As the raw values of the four data sets were on different scales, they were log‐transformed and standardized (see Supplementary information). As a result, each data set contained a continuum of mRNA and protein values ranging from high to low abundance for 445 distinct entities common to all four data sets. A self‐organizing map (SOM) algorithm was used to arrange the four data sets into distinct clusters (see Supplementary information). The algorithm was instructed to assemble 16 clusters, because this number achieved good performance in reproducibility (data not shown), average cluster homogeneity (0.81), and separation (−0.048).
The SOM revealed many similarities in the mRNA and protein abundance patterns in the two yeasts, but also marked differences. The most frequent patterns represented roughly equal mRNA and protein levels in both organisms (clusters 3, 4, 7, 9, 10, and 13; Figure 5A). In addition, one pattern was indicative of concordantly low mRNA and high protein abundance in both yeasts (cluster 6), whereas cluster 15 showed the opposite pattern. Among the discordant patterns were those with higher mRNA and protein levels in S. pombe (cluster 1), as well as various patterns where either mRNA or protein levels found in one yeast deviated from what was found in the other (clusters 2, 5, 8, 11, 12, 14, and 16).
The clusters were further interrogated for overrepresented S. pombe GO terms using the FuncAssociate tool (Berriz et al, 2003). In total, seven nonredundant GO attributes were found significantly (P⩽0.0005) overrepresented in the clusters (Figure 5A). As noise is a notorious feature of large‐scale functional genomics data, the biological significance of these patterns will require further validation by more targeted experiments. However, as not a single GO attribute was enriched in SOM clusters derived from a random data set under identical conditions (data not shown), our data suggest that many pathways and complex subunits are coordinately, albeit not necessarily concordantly regulated in both fission and budding yeasts. For example, 6/13 components of the microtubule cytoskeleton organization GO category present in our data sets were coordinately and concordantly regulated in both yeasts (cluster 10; Figure 5B). In contrast, ATPases and entities involved in chromatin remodelling and intracellular transport were coordinately, but discordantly regulated with mRNA levels being low in budding yeast (cluster 14, Figure 5C–E).
Although 48 out of 121 fission yeast ribosomal subunits present in all data sets were coordinately regulated, they partitioned into two distinct clusters (3 and 16, Figure 5A). Both clusters indicated that fission yeast ribosomal protein mRNAs are typically higher than the subunits they encode (Figures 5F and 3C). Notably, higher mRNA than protein levels were previously reported also in human cells (Ishihama et al, 2005). Coordinate post‐transcriptional regulation of ribosomal proteins in both budding and fission yeasts was already observed in previous reports (Washburn et al, 2003; Bachand et al, 2006). Ribosomal proteins are known to be subject to extensive transcriptional and post‐transcriptional control as indicated by short mRNA half‐lives (Li et al, 1999) and extensive translational regulation (Meyuhas, 2000; Bachand et al, 2006). Although presumably serving to provide stoichiometric amounts of complex subunits, such control might also ensure the excess availability of individual ribosomal subunits that fulfill extraribosomal functions (Wool, 1996), a repertoire, that may vary from one organism to another.
Overall, our comparison reinvigorates the conclusion gained from previous functional genomics studies that similarities in the control of gene expression in the two yeasts are less pronounced than expected from genome comparisons (Mata et al, 2002; Rustici et al, 2004; Oliva et al, 2005). Only a remarkably small fraction of transcriptomic changes during cell‐cycle progression (Rustici et al, 2004; Oliva et al, 2005) and sexual differentiation (Mata et al, 2002) is shared among the two yeasts. True organism‐specific differences are therefore likely to underlie the moderate overall correlation in protein abundance in the two yeasts (Figure 2E and Supplementary Figure 1) as well as the different patterns of mRNA and protein expression revealed here by SOM clustering (Figure 5).
Shotgun proteomics employing multidimensional prefractionation and tandem mass spectrometry, aided by mathematical modelling of spectral count information, enabled a label‐free relative quantitation of ∼30% of the theoretical fission yeast proteome corresponding to an estimated 50% of the entire vegetative translatome. Whereas Eno101, a subunit of the phosphopyruvate hydratase complex, was revealed as the single most abundant protein, the translation initiation factor eIF4 represents the most abundant protein complex. Highly abundant proteins also included the core set of proteins conserved in metazoans. Among the least abundant proteins observed in this study were S. pombe‐specific proteins, a series of nonessential proteins, as well as proteins modified by phosphorylation and ubiquitylation. Whereas there was a positive overall correlation between protein and mRNA abundance in fission yeast similar to what was observed in other organisms, simple correlations proved insufficient to asses regulatory patterns of gene expression. Contrasting individual protein–mRNA ratios to the ratio distribution curve representing all entities suggested common schemes of control for subunits of protein complexes, unstable ubiquitylated proteins, and several functional pathways. The first large‐scale comparison of mRNA and protein abundance in two related eukaryotic model organisms indicated frequently coordinate, but rarely concordant regulation, an observation that further underscored the marked differences in gene expression in the two yeasts noted previously (Mata et al, 2002; Rustici et al, 2004; Oliva et al, 2005). The data presented should become a valuable resource for the fission yeast community as well as researchers mining comprehensive gene expression data sets for systems biology.
Materials and methods
Preparation of fission yeast cell lysate
S. pombe cells (DS 448/2=927 h‐ leu‐1‐32 ura4d‐18) were grown in 50 ml YES to mid‐log phase (OD595=0.68). Cells were washed in STOP buffer (150 mM NaCl, 10 mM EDTA, 50 mM NaF, 1 mM NaN3) and lysed in 450 μl buffer containing 7.7 M urea, 2.2 M thiourea, 0.55% CHAPS, 10 mM Tris (pH 8.5), 200 mM DTT and protease inhibitors by bead lysis in a Fastprep device (Bio 101). The cell homogenate was cleared by centrifugation and the bead lysis was repeated once with the pellet of insoluble debris. The two homogenates were pooled (950 μl) and incubated at room temperature (RT) for 30 min. A volume of 5.2 ml of 99% N,N‐dimethylacrylamide (Sigma) was added, followed by another incubation at RT for 30 min after which 10 μl 2 M DTT was added for 5 min at RT. The homogenate was cleared by centrifugation for 15 min at 14 000 g, resulting in a denatured, reduced, and alkylated sample with a concentration of ∼10 mg/ml.
Sample prefractionation by IEF on the ZOOM device (Invitrogen), the multicompartment electrolizer (MCE, Proteome Systems), on immobilized pH gradient (IPG) gel strips, by strong anion exchange (SAX) membrane adsorber spin columns (VivaScience), and by 1D‐PAGE was performed as described in detail in Supplementary information.
LC ESI MS/MS
Trypsin digestion before LC MS analysis as well as protein identification by 1D Nano‐LC tandem mass spectrometry and on‐line 2‐D LC ESI‐MS/MS analysis on a Thermo Electron LCQ Deca XP Plus ion trap instrument are described in Supplementary information. This section also contains details on the SEQUEST database searching criteria and the parameters for adjusting the false positive peptide identification rate to 1% as determined by searching a combined forward and reverse S. pombe proteome database. Search parameters for the identification of PTMs are also stated in Supplementary information.
Spectral count modelling by likelihood‐based goodness‐of‐fit criteria was performed by negative binomial log‐linear regression. The best‐fit statistics were obtained for a model considering the number of fully tryptic peptides assuming one miscleavage. This model was used for adjusting spectral counts to protein size. rp and rs between ASCs and mRNA and budding yeast protein data sets were computed. For SOM cluster analysis, data were preprocessed by log‐transforming and subsequent standardization. Each of the variables was standardized by subtracting its mean and dividing by its standard deviation. A full description of all statistical methods is presented in the Supplementary information.
We thank J Leatherwood for help with cDNA microarrays and privileged access to unpublished data, V Wood for access to fission yeast genome data, and K Doud for expert technical assistance. MWS is grateful to DH Wolf (University of Stuttgart) for support. This work was funded by NIH grant GM59780 to DAW and by the NIEHS Center grant ES‐00002.
Supplemental Methods [msb4100117-sup-0001.doc]
Supplementary Data File 1 [msb4100117-sup-0002.xls]
Supplementary Data File 2 [msb4100117-sup-0003.xls]
Supplementary Data File 3 [msb4100117-sup-0004.xls]
Supplementary Data File 4 [msb4100117-sup-0005.xls]
Supplementary Data File 5 [msb4100117-sup-0006.xls]
Supplementary Data File 6 [msb4100117-sup-0007.xls]
- Copyright © 2007 EMBO and Nature Publishing Group