We report a proteomic analysis of microdissected material from formalin‐fixed and paraffin‐embedded colorectal cancer, quantifying >7500 proteins between patient matched normal mucosa, primary carcinoma, and nodal metastases. Expression levels of 1808 proteins changed significantly between normal and cancer tissues, a much larger fraction than that reported in transcript‐based studies. Tumor cells exhibit extensive alterations in the cell‐surface and nuclear proteomes. Functionally similar changes in the proteome were observed comparing rapidly growing and differentiated CaCo‐2 cells. In contrast, there was minimal proteomic remodeling between primary cancer and metastases, suggesting that no drastic proteome changes are necessary for the tumor to propagate in a different tissue context. Additionally, we introduce a new way to determine protein copy numbers per cell without protein standards. Copy numbers estimated in enterocytes and cancer cells are in good agreement with CaCo‐2 and HeLa cells and with the literature data. Our proteomic data set furthermore allows mapping quantitative changes of functional protein classes, enabling novel insights into the biology of colon cancer.
In‐depth proteomic analysis of microdissected colorectal cancer identifies extensive alterations in the cell‐surface and nuclear proteomes between normal mucosa and adenocarcinoma, but observes strikingly little proteomic change between cancer and metastases.
First large‐scale proteomic analysis of microdissected tissue from archival formalin‐fixed and paraffin‐embedded material.
Quantitation of 7576 proteins between patient‐matched samples of normal colonic mucosa, primary cancer, and nodal metastasis.
Expression levels of 1808 proteins changed significantly between normal and cancer tissues.
Total Protein Approach (TPA)—a new way to determine protein copy numbers per cell without protein standards.
In the past decades, colorectal cancer (CRC) has been comprehensively studied with ever more refined molecular biology methods (Fearon, 2011). A large number of expression and genomic profiling studies have contributed to our current understanding of the different cellular mechanisms underlying colorectal tumor formation, maintenance, and development of metastasis (Cardoso et al, 2007). However, these achievements did not provide direct information on the composition of the major active components of the cell, which are its proteins. To fill this gap in understanding of the disease, many proteomic studies have been performed. They investigated clinical material or cultured cells, typically providing extensive lists of protein changes in CRC. Usually, these observations were restricted to highly abundant molecules (Jimenez et al, 2010). The application of proteomics has been hampered by a lack of methods allowing exploration of tissues to a depth comparable to that provided by molecular biology methods, which analyze alterations in the cellular genome or transcript levels (Nambiar et al, 2010). At best proteomic studies were able to identify and quantify a few thousand proteins, but most studies focused only on changes in the levels of a few proteins. This is in stark contrast to microarray or deep sequencing‐based transcriptomic studies that can cover large parts of the transcriptome. In particular, proteomics so far did not provide systematic insights into changes in signaling pathway, alterations in the cell‐surface proteome, or variations in processing genetic information in CRC. Moreover, proteomic approaches have usually required large amounts of sample and therefore could not profit from microdissection techniques enabling isolation of defined populations of cells.
Recently, our group has reported advances in sample preparation methods that enabled studying membrane proteins in the same manner as soluble proteins (Wisniewski et al, 2009b; Wisniewski, 2011) and allowed identification and comparison of thousands of proteins from samples containing only few microgram of total protein (Wisniewski et al, 2011a). In addition, we have recently developed an efficient protocol that enables the analysis of proteomes and post‐translational modifications in formalin‐fixed and paraffin‐embedded (FFPE) tissues (Ostasiewicz et al, 2010). Taking advantage of these technological developments in conjunction with ongoing refinements in high accuracy mass spectrometry (Nagaraj et al, 2011), we set out to analyze proteomes of normal mucosal tissue, adenocarcinoma, and its nodal metastasis from laser microdissected patient matched FFPE samples.
Using label‐free quantitative analysis, we wanted to obtain insights into overall changes in subcellular composition, molecular function, and pathways in CRC. We report relative quantitation of 7576 proteins between the three stages and show that a large fraction of the proteome is changed between the normal mucosa and cancer cells, whereas the altered proteome profile is preserved largely unchanged in nodal metastases.
Identification of 8000 proteins from laser microdissected colonic mucosa, primary cancer, and its nodal metastasis
To survey and compare the proteomes of colonic mucosa (normal or N) and colon cancer (cancer or C), we analyzed archival FFPE clinical samples originating from eight patients (Supplementary Table 1; Materials and methods). In addition, for seven of the patients we analyzed the proteomes of nodal metastases (M). Laser capture microdissection was used to obtain enriched populations of enterocytes, primary cancer, and metastasizing cells (Supplementary Figure 1). We made an effort to reduce any contamination by stroma and therefore did not emphasize identification of secreted proteins. From each sample, a volume of about 175 nl of cells were collected and processed using the FFPE‐FASP (filter aided sample preparation) procedure (Wisniewski et al, 2011a). Yields were 6.1±1.8 μg total peptide per sample. To maximize depth of proteome coverage, peptides were fractionated by anion exchange chromatography into six fractions before they were analyzed by LC‐MS/MS on a linear ion trap Orbitrap mass spectrometer (Wisniewski et al, 2011a; Figure 1A). Fragment spectra were obtained by Higher Energy Collisional Dissociation (HCD) with high mass accuracy (Olsen et al, 2009). The complete data acquisition took 23 days. The obtained mass spectrometric raw data were analyzed in the MaxQuant environment (Cox and Mann, 2008) with the integrated Andromeda searching engine (Cox et al, 2011). Label‐free quantitation (LFQ) algorithms in MaxQuant (Luber et al, 2010) allowed quantitative comparison of the individual samples. In our analysis, 72 000 unique peptides corresponding to 8173 proteins were identified at a false discovery rate (FDR) of 1% (Supplementary Tables 2 and 3). Comparison of the N, C, and M proteomes revealed that 99% of the identified proteins were common in all three stages (Figure 1B). Of these, 7576 proteins were identified at least four times in one of the states (N, C, or M) and only these were subjected to further analysis and statistical evaluation (Supplementary Table 4). In all, 90% of these proteins and 94% of the proteins that were found to be significantly changed between the normal and cancer cells were identified by at least two peptides (Figure 1C). The summed mass spectrometric peptide intensities—a proxy for the protein abundances of the identified proteins—span six orders of magnitude, but the levels of 98% of the proteins were within a 10 000‐fold expression range (Figure 1D). Comparison of the abundances of all proteins and those identified as significantly changed showed that with descending abundance the frequency of detection of statistically significant protein changes decreases (Figure 1E). For highly abundant proteins, expression levels of half of the proteins changed significantly, whereas for low abundant proteins this proportion was much smaller (Figure 1E). This is likely due to technical factors and indicates that a substantial portion of quantitative differences remains undetected. Assuming that over the entire range of abundance the portion of proteins with altered levels is similar, we expect that about 50% of the proteome is changed between normal and cancer cells.
Colon carcinoma is accompanied by extensive subcellular and functional changes
To describe the protein content constituting defined cellular compartments or molecular functions and to provide insight into differences between cellular states, we used the total signal intensities of the peptides identifying each protein as determined in the MaxQuant software (normalized values for LFQ intensities). These peptide intensities are a good proxy for the absolute abundance of the proteins, especially when a set of proteins is considered because inaccuracies for specific proteins are averaged out. Panels A–G in Figure 2 show representative examples of the protein content of selected subcellular compartments. Clear alterations were apparent between N and C, but not between C and M. In particular, the abundance of extracellular and integral to plasma membrane proteins in the N proteome was 20–30% higher than in the C proteome whereas the abundance of nuclear proteins was about 30% less (Figure 2A, B). We did not find clear differences between N and C in the overall abundance of proteins belonging to cytoplasm (Figure 2C) and its major components such as the Golgi body, mitochondrion, and endoplasmic reticulum (Figure 2D–F). We next investigated selected functional categories related to the altered subcellular compartments (Figure 2H–N). Partial loss of integral plasma membrane proteins in cancer was accompanied by a 50 and 30% decrease in the abundance of channel and transporter proteins, respectively (Figure 2H and I). Several individual examples of this global decrease are shown in Tables I and II. They included a >10‐fold decrease of chloride and sodium channels and three‐ to five‐fold reduction of FXYD ion transport regulators 1 and 3 (Table I). The latter proteins are also known as Phospholemnan and Mat‐8 and are known modulators of Na,K‐ATP‐ases that affect their kinetic properties (Garty and Karlish, 2006). The increase of the total content of nuclear proteins correlates well with changes in the content of histones, subunits of the RNA polymerases, and transcription factors (Figure 2J, K). We found that general transcription factors and chromatin activators such as histone deacetylases and the high mobility group proteins were 2‐ to 10‐fold upregulated in the cancer samples (Table I). A similar increase of protein abundance was found for a number of nuclear transporters (Table II).
According to the canonical model of chromatin one linker histone H1 molecule occurs per one core histone octamer. Since the molecular masses of the linker and the core histones are about 20 and 10 kDa, respectively, the total amount of H1 should be a quarter of the core histones (Figure 2E). Indeed, this is the ratio we observed in the N samples (0.24). In contrast, in the cancer cells the content of core histones was increased and the ratio of H1 to core histone decreased to only 0.14. This suggests a reduced compaction of chromatin accompanying elevated transcriptional activity.
Additionally, elevated expression of proteins involved in cell‐cycle regulation in C and M reflects higher cell division rates of cancer cells in comparison with normal cells (Figure 2N). Similarly, elevated expression of ribosomal proteins in C and M reflects an upregulation of translation (Figure 2M). Interestingly, this change is not reflected in the content of endoplasmic reticulum (Figure 2E), suggesting that this functionality is not needed to a higher degree in cancer cells.
Identification of proteins changed in colon cancer
To assess the similarities of the analyzed proteomes in a global manner, we performed unsupervised clustering, in which samples are grouped on the basis of their expression patterns (Figure 3A). All eight samples of normal mucosa clustered together. In striking contrast, cancer and nodal metastases clustered into seven patient matched pairs. In these pairs, the primary tumor had a higher similarity to its metastatic counterpart than to other tumors. To identify potential driver proteins of cancer, we first compared the intensities of detected proteins between colonic mucosa, primary cancer, and its nodal metastasis. Paired t‐test was used to determine the significance of protein differences. In total, 1808 proteins were significantly (P<0.05) downregulated or upregulated in tumor tissues when compared with normal mucosa (Figure 3B; Supplementary Table 5). Changes of 762 proteins were observed at a P‐value of 0.01 (Figure 3B). The same statistical analysis of our proteomics data did not identify any significant changes between the primary tumor and the nodal metastasis.
Proteins that are significantly upregulated in cancer samples can be considered as potential biomarkers. Among these, cell‐surface proteins are particularly attractive because they may be targeted by antibodies. Table III lists 34 upregulated plasma membrane proteins. Two low abundant and one medium abundant protein were selected for further validation by immunohistochemistry. Antibodies against PALM3, MFI, and GPR56 strongly stained cancer cells whereas only weak staining was observed in normal tissue, thus validating upregulation of the proteins (Figure 4). Knowledge on proteins significantly changed between normal tissue and adenocarcinoma can be used for subsequent targeted analysis of larger number of samples, using either IHC or mass spectrometry‐based methods. This may identify clinically relevant correlations between protein upregulation in cancer and the disease outcome. Another attractive follow‐up to this study would be to search for the proteins identified as outliers as secreted proteins in plasma samples (Hanash et al, 2008). In this way, clinically useful biomarkers could be identified and potentially later monitored by conventional ELISA.
Proteomic screen captures quantitative changes in regulatory pathways
The high number of identified proteins allowed systematic investigation of functional alterations characteristic for colon cancer cells such as signaling pathways, metabolic processes, ion transport, replication, and transcriptional regulation, among many others. Alterations in signaling pathways due to gene mutations are recognized drivers of cancer development, but so far it has been difficult to measure the corresponding proteins directly by quantitative methods. We found that in CRC the WNT, p53, ERBB, or TGFβ pathways are most frequently affected. We identified 147 proteins involved in at least 1 of these pathways; 29 of them were significantly elevated in tumor and none significantly downregulated (Figure 5A). We observed a similar distribution of changes for insulin and chemokine pathways, with a majority of upregulated and only a few downregulated members (Figure 5A). The coverage of key members of these pathways and their quantitative changes is shown in Supplementary Figure 2.
Accelerated growth is a common feature of cancer cells and is reflected in an increase of the abundance of proteins involved in processing the genetic information. A considerable number of proteins involved in these processes were quantified in our proteomic data. All significantly changed proteins matching the GO categories of DNA replication and repair occurred at higher levels in cancer (Figure 5B). Similarly, of the 234 and 300 significantly changed proteins involved in the regulation of transcription and translation, respectively, 95% were upregulated (Figure 5C and D). Likewise, all quantified ribosomal and spliceosomal proteins with significant changes appear in cancer at higher levels (Figure 5C and D).
One of the most common biochemical phenotype of cancer cells, including those from colon, is their elevated glycolysis (the Warburg effect, Warburg, 1956; Vander Heiden et al, 2009) and, under hypoxia, reduced oxidative phosphorylation. Our list of changed proteins includes 11 proteins involved in glycolysis/gluconeogenesis (Figure 5E). It contains upregulated glycolytic enzymes such as phosphofructokinase, pyruvate kinase, or enolase. In contrast, 80% of the changes in the level of proteins involved in oxidative phosphorylation showed downregulation (Figure 5E). Examples are NADH dehydrogenase subunits 3, 4, 5, and 6 with at least five‐fold decrease and cytochrome c oxidase subunits 2, 3, 4, 5, and 6 with an average four‐fold decrease in expression (Supplementary Table 5). Many of these changes were generally expected, providing a positive control for our experiments. However, knowledge of the magnitude of protein expression changes in a near comprehensive data set of this pathway in vivo should provide interesting information for the characterization of CRC.
In summary, in this study we observed extensive downregulation in the global abundance of proteins occurring in the extracellular space and those having transporter or channel activity (see above). All plasma membrane channels were significantly downregulated in cancer (Figure 5F). Interestingly, receptor proteins occurring in the plasma membrane were upregulated.
Differentiation‐related changes in cultured colon cancer cells resemble the extensive subcellular and functional differences between cancer and normal cells
The unexpectedly large differences between cancer and normal cells prompted us to investigate another system where differentiated cancer cells (enterocyte‐like cells) can be compared with undifferentiated ones. For this purpose, we selected the human cell line CaCo‐2, which is a well‐established model system to study cellular differentiation of enterocytes. These cells have the capability to differentiate spontaneously into polarized cells with the morphological and after reaching confluence exhibit the biochemical properties of enterocytes (Zweibaum et al, 1983). We compared the proteomes of cells that reached 100% confluence with cells that were harvested after 4, 11, and 18 days beyond this point. Proteomics identified 9712 proteins of which 7504 were quantified between the stages (Supplementary Table 6).
Sucrase‐isomaltase is the standard marker of differentiation in CaCo‐2 cells and its transcript levels increase 70 times during this process (Buhrke et al, 2011). Our data revealed a 63‐ to 68‐fold increase of the abundance of this protein between day 0 and days 11 and 18 (Figure 6A), indicating that the cells were well differentiated.
Analysis of the temporary changes in the CaCo‐2 cell proteome showed that already during the first 4 days of the differentiation process, 23% of all quantified proteins changed significantly (Supplementary Table 6). At days 11 and 18 after confluence expression levels of more than half of the quantified proteins had significantly changed (Supplementary Table 6). These extensive alterations in protein abundances are reflected in changes in the cellular content of organelles and protein functional categories (Figure 6). Differentiation of CaCo‐2 cells is accompanied by an increase in extracellular and cell‐surface proteins (Figure 6B and C) and a decrease of the abundance of nuclear proteins including the core histones (Figure 6D and F), and the ribosomal proteins (Figure 6E). Notably, the content of linker histones changed little between days 0 and 18, similarly to the finding reported above for colon cancer. The overall changes observed during CaCo‐2 cell differentiation thus resemble the large‐scale alterations between cancer and normal cells in vivo, including both the large number of individual proteins changed and the alteration in the cell architecture.
Estimation of absolute copy number of proteins
Recent large‐scale proteomic analyses estimated protein copy numbers per cell by extrapolating from added standards (Beck et al, 2011; Schwanhausser et al, 2011). Here, we determine copy numbers per cell from our data simply on the basis of individual LFQ intensities compared with the total MS signal of the measured proteome. We named this method as total protein approach (TPA). By dividing these values by the molecular weight and multiplying by the Avogadro constant and by the protein content of a single cell, they can be converted into copy number values. To validate this approach, we first used a mixture of proteins with known protein concentrations. We found a linear response of the calculated protein amount with the amount of protein digested and measured by LC‐MS/MS (Figure 7A). Calculations using summed peptide intensities or the iBAQ algorithm resulted in similar values. Next, we applied our calculation method to an LC‐MS/MS analysis of a HeLa cell lysate. We compared the copy numbers of 23 proteins for which such values have recently been measured using stable isotope‐labeled PrEST standards (Zeiler et al, 2012). We found that the values obtained by the TPA and SILAC‐PrEST approaches were similar (r2=0.85; Figure 7B). This is even more remarkable considering that the SILAC‐PrEST study used standard tryptic digestion and was performed on an Orbitrap Velos instrument whereas the TPA analysis employed two enzyme digestion and SAX fractionation (MED‐FASP‐SAX) followed by peptide analysis on the Q Exactive mass spectrometer. Next, we employed the TPA method to calculate protein copy numbers from a HeLa data set generated using FASP, OFFGEL peptide separation, and Orbitrap Velos analysis (Wisniewski et al, 2009b) and compared the values with the copy numbers obtained in this study. Figure 7C shows that the copy number values are similar in both experiments (r2=0.87). This evaluation demonstrates that the TPA‐based copy number estimation is applicable to diverse large‐scale proteomic data set generated in the past.
To calculate copy numbers of proteins in the enterocytes, we assumed the volume of these cells to be about 1400 μm3 (Buschmann and Manke, 1981; MacLeod et al, 1991; Crowe and Marsh, 1993) and the protein content of mammalian cells to be about 20% (Ellis, 2001). The same volume and proteins content values were used for calculation of the copy numbers in carcinoma (C) (Materials and methods; Supplementary Table 7). For comparison, we also analyzed CaCo‐2 and HeLa cells (Supplementary Table 7). To validate the calculated values, we furthermore compared them with the values obtained by biochemical methods in the past.
Core histones are among the most abundant proteins in the eukaryotic cells. Our copy numbers for these proteins were 1.1 × 108 in normal mucosa to 4.4 × 108 in undifferentiated CaCo‐2 cells (Table IV). Assuming that all enterocytes are diploid, their DNA content is 2 × 2.85 billion base pairs (International Human Genome Sequencing Consortium, 2004) corresponding to 3.0 × 107 nucleosomes (assuming an average of 190 bp per nucleosome; van Holde, 1989). We determined a value of 3.5–14 core histone molecules per nucleosome, while the theoretical number is 8 molecules. For the adenocarcinoma and undifferentiated CaCo‐2 cells, we found higher values than in normal mucosa and in the differentiated cells (Table IV), which likely reflects the pre‐mitotic status of these cells in which the DNA was already duplicated. The numbers of linker histone were between 1.2 × 107 and 4.3 × 107 whereas the theoretical value is 3.3 × 107. This corresponds to 0.4–1.4 H1 molecule per nucleosome—similar to the average value of 0.95 observed across a variety of organisms (van Holde, 1989).
Western blot‐based quantitative characterization of RNA polymerase II complex in HeLa cells by Kimura et al (1999) revealed that its subunits are present at 1.5–3.0 × 105 copies per cell (Table IV). Our TPA‐based estimation in HeLa and CaCo‐2 cells resulted in similar values of 1.2–2.2 × 105 copies per cell. In contrast, in the microdissected cells the abundance of polymerase II was about one order of magnitude lower. This was accompanied by a reduction of the copy numbers of TFIIE and TFIIF, suggesting a lower transcriptional activity in N and C compared with cultured cells. The copy numbers of the polymerase and TFIIB changed little between N and C. Similar changes were observed between non‐differentiated and differentiated CaCo‐2 cells. Finally, the TPA estimates of the total number of proteins per cell are in the range of 4.1–8.3 × 109 (Table IV) and these are generally within two‐fold agreement with the literature values.
The absolute copy numbers calculated for cultured CaCo‐2 cells were much more reproducible than those obtained for microdissected tissue (Table IV). The reason for this difference is the high variability of the abundances in different clinical samples in comparison with cultured cells (Supplementary Figure 3). Whereas the absolute values measured in N and C vary within an order of magnitude the variation in the cultured cells is usually much lower. A large variation in the absolute numbers in the human material has been reported in the past. For example, a recent MRM‐based study of a large number of human hepatocyte proteins resulted in differences of up to one order of magnitude for some proteins between donors (Schaefer et al, 2012).
Comparison with previous transcriptomics and proteomics results
Over the past decade a large number of transcriptomic studies aimed at identification of genes that are upregulated or downregulated in CRC have been carried out. Cardoso et al (2007) summarized the results of 30 investigations using array platforms for identification of genes differentially expressed in colorectal carcinomas when compared with normal mucosa. Although in total close to 1000 changes were reported in these studies only 128 genes were identified in at least 3 of them. Strikingly, for the majority of these genes (96) we identified the corresponding proteins in our data set. At the protein level, we found 49 of these to be significantly changed between carcinoma and normal mucosa and in each case the direction of change was the same as that observed by transcriptomics analysis (Supplementary Table 8). This high overlap of the changed proteins found in a single proteomic study with a compiled data set from 30 transcriptomic studies demonstrates the potential and the robustness of proteomics for unveiling gene expression changes in cancer. Of note, the abundances of the 49 proteins whose abundance changed in both our proteomic and transcriptomic data sets are distributed over 4 orders of magnitude of transcript abundance (Figure 8A). This is similar to the abundance distribution of all changed proteins found in this study, suggesting that the sensitivity and dynamic range of proteomic analysis can be comparable to that of microarray‐based approaches.
The absence of significant changes between adenocarcinoma and nodal metastasis was an unexpected observation of this study. This could have been caused simply by the inadequate sensitivity of our analysis, which could have precluded identification of proteins whose changes specifically accompany metastasis. We tested this possibility by comparing the list of 41 proteins reported to be changed between normal mucosa and cancer and cancer and metastasis in a microarray study (Nambiar et al, 2010) with our data. We found 80% of these proteins in our data set (Supplementary Table 9), suggesting that our analysis had the potential to at least identify the proteins previously reported to be different between primary cancer and nodal metastases.
During recent years many proteomic studies of CRC have been published, each providing sets of proteins observed to change between cancer and normal mucosa. Comparison of these analyses showed that only 45 proteins were common in at least 3 of these studies (Jimenez et al, 2010). Out of these, we identified 44 proteins and found 19 of them to be significantly changed (in the same direction, Supplementary Table 10). Since the previous studies were mainly based on 2D gel separation, they were usually restricted to soluble and abundant proteins. Indeed, the 44 common proteins are on average two orders of magnitude more abundant than the average of proteins identified in this study (Figure 8B). This illustrates the advantages of proteomics workflows that are based on gel‐free, high‐resolution mass spectrometric methods.
In addition to 2D gel‐based studies, there are also a few reports of CRC analysis using one‐dimensional electrophoresis coupled to LC‐MS/MS (Han et al, 2011; Van Houdt et al, 2011; de Wit et al, 2012). In these studies, the number of proteins identified and compared between different CRC stages was in a range of 2000–3000, considerably fewer than the number of transcripts probed in array‐based studies and of the 7500 proteins identified here. This suggests that those studies were also biased toward the more abundant proteins in the cancer proteome. For example, proteins reported to be upregulated in cancer discussed as potential biomarkers in very recent studies—STOML2 (Han et al, 2011), BIRC6 (Van Houdt et al, 2011), OLFM4 (Conrotto et al, 2008; Besson et al, 2011), NGAL (Conrotto et al, 2008), and GLUT1 (de Wit et al, 2012)—have relatively high abundances between 1.6 × 10−2% and 1.9 × 10−3% of total protein mass, respectively (Supplementary Table 4). Our data confirm that STOML2, BIRC6, and GLUT1 are 2.4‐ to 20‐fold upregulated in cancer samples (Supplementary Table 5). In contrast, the extent of upregulation of OLFM4 and NGAL was very variable, with cancer to normal ratios changing from 0.09 to 109 (Supplementary Table 4). The variability in the abundance of OLFM4 has already been observed by IHC analysis and likely reflects its transient rise during pre‐ and early cancer stages (Besson et al, 2011). The variability in C/N ratios of NGAL may be due to the fact that this protein appears abundantly in stroma (Conrotto et al, 2008), which can contaminate the microdissected material.
Since our large‐scale analysis covers previously identified potential biomarkers very well, our set of upregulated proteins in cancer may be useful in future research aimed at the identification of novel biomarkers. In particular, our analysis contains almost all of the proteins identified in studies involving subcellular fractionation of clinical material (Conrotto et al, 2008; Albrethsen et al, 2010; de Wit et al, 2012). Such fractionation strategies typically require large amounts of starting material and are seldom of a purity that can substantially increase the depth of analysis. For example, nuclear and cell‐surface proteomes contain albumin and abundant mitochondrial proteins (Conrotto et al, 2008; Albrethsen et al, 2010).
The present proteomic study demonstrates unequivocally that proteomics is now capable of analyzing cancer proteomes in great depth. Using only minute amounts of archival, laser‐microdissected FFPE tissues, we identified >8000 proteins and quantified 7576 of them between normal mucosa, cancer, and metastasis. To our knowledge, this is by far the greatest proteome coverage achieved in studies of clinical material. Nevertheless, this data set still contains many proteins for which changes were observed between normal and tumor cells but for which these changes were not statistically significant due to high variation. For instance, >600 proteins were identified in only a few of the analyzed samples and were therefore not used for quantitative analysis. Further developments in sample preparation and mass spectrometric analysis as well as larger numbers of patients are still required for extending our knowledge of the CRC proteome.
An important implication of this study is that proteomics does not require fresh or frozen material for studying human material for studying diseases. The archival FFPE material appears to be a valuable source of proteins, which can readily be compared between various stages of disorders and between different tissues. Furthermore, the label‐free quantification approach used in this study, while not as accurate as stable isotope‐based approaches (Geiger et al, 2010), offers a straightforward way to quantitatively analyze major proteomics features of clinical samples, and was also used in several recent proteomic studies on cancer (Gamez‐Pozo et al, 2012; Meding et al, 2012; Pan et al, 2012). It requires neither specific reagents for labeling of the peptides nor standards. Clearly, these and other developments are now making proteomics readily applicable to the exploration of clinical samples.
Our study provides insights into the composition of three proteomes: normal colonic mucosa, CRC, and its proximal metastases. Most strikingly, we uncovered drastic proteomic remodeling between healthy and neoplastic epithelial cells, apparently involving half of the expressed proteins. This stands in a clear contrast to the results of individual gene expression profiling studies by microarrays, which typically report only a relatively small fraction of genes (1.8–2.5%) to be differentially expressed (Notterman et al, 2001; Birkenkamp‐Demtroder et al, 2002; Cardoso et al, 2007). This difference between microarray technique and proteomics may reflect both biological and technical factors. For instance, differential turnover of RNA and protein is a major determinant of differences of the transcriptome and the proteome (Schwanhausser et al, 2011). The latter are not mirrored by transcriptomic analyses but are taken account of in proteomics analysis and may be especially important in tissue analysis, where many cells are slowly growing or post‐mitotic. Furthermore, proteins are generally much more stable than RNA during sample preparation, making protein quantification potentially more robust and opening the possibility to readily analyze archival FFPE material as demonstrated in this study.
In striking contrast to the large‐scale rearrangement of the proteome between normal mucosa and cancer, we found little if any significant changes between cancer and its nodal metastases. This suggests that the proteome of the cancer cells does not appear to need major further adaption at distal sites. Of course, limited quantitative accuracy as well as the relatively small number of tumor samples could also partially contribute to these conclusions. However, this finding is in accordance with systemic biology studies on breast cancer malignancy indicating no major differences between gene expression pattern in cells of primary cancer and its nodal metastases (Weigelt et al, 2003, 2005; Harrell et al, 2011).
CRC has been studied to a lesser extent than breast cancer and most gene expression studies have focused on differences in gene signatures of primary cancers either in localized or in generalized stages of that disease (Nannini et al, 2009), although there are some studies comparing expression profiles of primary and metastatic cells (Yanagawa et al, 2001; Koehler et al, 2004; Kwong et al, 2005; Kleivi et al, 2007; Lin et al, 2007, 2011). Most of these studies report only a few differentially expressed genes in metastases (Yanagawa et al, 2001; Ki et al, 2007; Kleivi et al, 2007; Lin et al, 2011) and only two such genes were in common in at least three papers (summarized in Lin et al, 2011). Therefore, the transcriptomic analyses for both CRC and breast cancer agree with our proteomic finding that gene expression changes between primary tumor and metastases are relatively minor. Interestingly, a recent 2DE‐based proteomic study also did not report significant changes in about 1000 compared proteins between primary colon cancer and liver metastasis (Shi et al, 2011). Despite the fact that our analysis is the most comprehensive proteomic study on CRC so far, we cannot exclude the possibility that some proteins which are specifically upregulated in nodal metastases escaped detection due to their lower abundance or due to high variation between samples originating from different donors. It is also possible that proteins enabling the primary tumor to metastasize were already upregulated in the primary cancer and were therefore not detected in our differential analysis.
Here, we also introduced a simple method for estimation of absolute protein copy numbers without using external standards or isotope labels, the TPA. The method is based on the observation that the 3000 most abundant proteins of the cell already constitute >99% of the proteome mass. Thus, using intensity values for each protein, a fractional value of the MS signal (LFQ intensity) of a protein compared with the total MS signal is a good proxy of the percentage of its protein mass to total protein mass. This can then be converted into numbers of molecules per cell by measuring or estimating the volume and protein content of the analyzed cells. Potentially, the method can be applied to any protein mixture—as simple as several proteins, purified complexes, or organelles. Despite the simplicity of the approach we obtained good agreement with the theoretical values for the histones and found high consistency between our estimates and biochemical data in the abundance of RNA polymerase II and its general transcription factors in HeLa cells. Several of these proteins were about one order of magnitude less abundant in the enterocytes than in the adenocarcinoma cells and the cultured colon cancer cells. Thus, even the abundance of protein involved in basic cellular processes can be quite different between cells and these differences can be studied by the TPA method without using standards. Importantly, the TPA approach can be used on any previously generated large‐scale data set.
The presented data comparing proteomes of normal mucosa and the primary cancer can be further studied at two distinct levels: First, they allow GO category‐based analysis of proteome‐wide changes in the cell architecture and functional classes. In a second step, the proteomic data for each of the members of the category can be examined to discover the extent of the quantitative change. Depending on the abundance of the protein, statistically significant upregulation can involve a percent change or it can be several‐fold. We believe this is a particular strength of proteomics as opposed to transcriptomics. In conclusion, we here demonstrate proteomics analysis of colon cancer to a depth approaching standard microarray studies, yielding novel insights into proteome remodeling during cancer development.
Materials and methods
FFPE human tissue
Archival FFPE samples of grade 2 of CRC were obtained from the Department of Pathology of Wrocław Medical University. Analysis of the samples followed an informed consent approved by the local ethics committee.
Tissue microdissection and lysis
To obtain enriched populations of enterocytes, primary cancer, and metastasizing cells, tissue was dissected with the Laser Pressure Catapulting (LPC) PALM Instrument (Zeiss, Göttingen, Germany). The enterocytes (‘normal appearing’) were dissected from normal tissue adjacent to cancer. Collected tissues were lysed in a buffer consisting of 0.1 M Tris–HCl, pH 8.0, 0.1 M DTT, 0.5% (w/v) polyethylene glycol 20 000, and 4% SDS at 99°C as described previously (Ostasiewicz et al, 2010; Wisniewski et al, 2011a).
Protein digestion and peptide fractionation
Detergent was removed from the lysates and the proteins were digested with trypsin using the FASP protocol (Wisniewski et al, 2009b) using the 30 k filtration units (Cat No. MRCF0R030; Millipore) (Wisniewski et al, 2011b). The resulting peptides were fractionated according to the previously described pipette tip protocol (FASP‐SAX) (Wisniewski et al, 2009a).
Proteomic analysis of HeLa and CaCo‐2 cells
Frozen cells were lysed in 2% SDS, 0.1 M Tris–HCl, pH 8.0, and 0.1 M DTT. Aliquots of HeLa cells containing 20 μg of total protein were processed according to the MED‐FASP protocol using consecutive two step digestion with LysC and trypsin (Wisniewski and Mann, 2012). Peptides released by LysC and trypsin were fractionated into four and two SAX fractions, respectively (Wisniewski and Mann, 2012).
The analysis of the microdissected cells was performed as described previously. Briefly, SAX‐fractionated peptides were separated on a reverse phase (15 cm × 75 μm inner diameter) ReproSil‐Pur C18‐AQ, 3 μm resin (Dr Maisch GmbH, Ammerbuch‐Entringen, Germany) column using a 230‐min acetonitrile gradient and were analyzed with an LTQ‐Orbitrap Velos mass spectrometer using a ‘high‐high’ strategy with HCD (Olsen et al, 2007).
The analysis of the cultured cells and protein standards was performed using reverse phase column (20 cm × 75 μm inner diameter) packed with 1.8 μm C18 particles (Dr Maisch GmbH) using a 4‐h acetonitrile gradient in 0.1% formic acid at a flow rate of 250 nl/min. The LC was coupled to a Q Exactive mass spectrometer (Michalski et al, 2011; Thermo Fisher Scientific, Germany) via a nanoelectrospray source (Proxeon Biosystems, now Thermo Fisher Scientific). The Q Exactive was operated in data‐dependent mode with survey scans acquired at a resolution of 50 000 at m/z 400 (transient time 256 ms). Up to the top 10 most abundant isotope patterns with charge ≥2 from the survey scan were selected with an isolation window of 1.6 Th and fragmented by HCD with normalized collision energies of 25. The maximum ion injection times for the survey scan and the MS/MS scans were 20 and 60 ms, respectively. The ion target value for both scan modes was set to 106. The samples were analyzed in the order of 1 (C, N, M), 2 (C, N, M)…8 (C, N, M), but the order of N, C, and M in each group was random. The files are available at ‘Tranche’.
CRC Data: Hash key: 4J7wto4aNCmjeYCcihnNWakm2M94h8orULE0obgNeg2f/8f1mPMCuTtbGmIoeqRRvCltK4pGmvGWSdC3VFmON9e/ X90AAAAAAABiXw==
HeLa Data: Hash key: y9r+XWuGRc9pnH5LtJ/n0qarp4j KHWQpk87dFCVLIU4ZmtD9hqnR4TeBXvIWWKxyzP1HPFdo+1n8EzCezrYxH62kGrMAAAAAAAALGw==
The MS data were analyzed using the software environment MaxQuant (Cox and Mann, 2008) version 126.96.36.199. Proteins were identified by searching MS and MS/MS data of peptides against a decoy version of the International Protein Index (IPI) human database (v.3.68). Carbamidomethylation of cysteines was set as fixed modification. The maximum false peptide discovery rate was specified as 0.01. Label‐free quantification was carried out in MaxQuant as previously described (Luber et al, 2010). Protein abundance was calculated on the basis of the normalized spectral protein intensity (LFQ intensity). Quantifiable proteins in the analysis of clinical samples were defined as those identified at least four times (50%) in at least one type of sample (N, C, M). In CaCo‐2 experiment, the quantifiable proteins occurred at least three times in at least one time point.
Statistical analysis was handled in R ( http://www.R‐project.org). Zero intensities were filled with intensities from the lower part of normal distribution (imputation width=0.3, shift=1.8). Unsupervised hierarchical clustering analysis was performed to identify groups that show similar characteristics. A paired t‐test was applied for testing of differences in protein intensities in clinical samples. Analysis of CaCo‐2 data was performed using t‐test. Significance of outliers was calculated by multiple hypothesis testing (Benjamini and Hochberg, 1995) with the threshold value of 0.05.
Calculation of intensity‐based total protein values and copy numbers per cell
The total protein content was defined as a sum of peptide intensities integrated over the elution profile of each peptide. The amount of individual proteins was calculated as the ratio of their LFQ intensity (LFQ MS intensity) to the sum of all LFQ intensities (total protein) in the measured sample. TPA calculations were performed assuming an average HeLa cell volume of 2800 μm3 (calculated using an average diameter of 17.5 μm) and the enterocyte volume of 1400 μm3 (average of values from references, Buschmann and Manke, 1981; MacLeod et al, 1991; Crowe and Marsh, 1993). The latter value was also used for calculation of protein copy numbers in adenocarcinoma cells and CaCo‐2 cells. All calculation used an average cellular protein content of 20% of the cell (Ellis, 2001).
Five‐micrometer paraffin sections were cut, mounted on slides, and rehydrated. For antigen retrieval, sections were microwaved in citrate buffer (pH 6.0) for 10 min at 750 W. Non‐specific bindings were blocked with Protein Block Solution (Abcam) for 10 min. Then, the sections were washed with PBST solution and incubated for 1 h at room temperature with rabbit primary antibodies: anti‐PALM3 (Abgent, 1:10 dilution), anti‐MFI2 (Sigma‐Aldrich, 1:50 dilution), and anti‐GPR56 (Abcam, 1:100 dilution). After washing in PBST and subsequent 15‐min incubation with secondary HRP‐labeled antibodies (Abcam), sections were blocked with 0.3% hydrogen peroxide for 10 min and stained for 5 min with DAB. Finally, sections were counterstained with hemotoxyline. Intensity of staining was assessed by two independent observers using the IHC‐score scale described elsewhere (Kok et al, 2010) with values ranging from 0 points (no staining) to 12 points (>75% of cells strongly stained).
We thank Piotr Ziołkowski (Wroclaw Medical University) for providing the samples and Korbinian Mayr for assistance in mass spectrometric analysis and Katharina Zettl for excellent technical assistance. This work was supported by the Max‐Planck Society for the Advancement of Science, by the European Commission's 7th Framework Program (grant agreement HEALTH‐F4‐2008‐201648/PROSPECTS), the Munich Center for Integrated Protein Science (CIPSM), and the Polish National Center of Science (DEC‐2011/01/N/NZ5/04253).
Author contributions: JRW designed the experiments, performed the MS analysis, analyzed the data, supervised the work, and wrote the manuscript; PO designed the experiments, selected clinical samples, and performed microdissection; KD selected clinical samples and performed the IHC experiments; DZ and FG analyzed the data. MM supervised the work and wrote the manuscript.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Materials File #1 [msb201244-sup-0001.pdf]
Supplementary Table 1 Clinical Patient Information [msb201244-sup-0002.xls]
Supplementary Table 2 Identified peptides [msb201244-sup-0003.xls]
Supplementary Table 3 All identified proteins [msb201244-sup-0004.xls]
Supplementary Table 4 Quantified proteins [msb201244-sup-0005.xls]
Supplementary Table 5 Outliers [msb201244-sup-0006.xls]
Supplementary Table 6 Quantified proteins in CaCo2 cells [msb201244-sup-0007.xls]
Supplementary Table 7 Absolute protein copy numbers [msb201244-sup-0008.xls]
Supplementary Table 8 Transcriptomics vs this study [msb201244-sup-0009.xls]
Supplementary Table 9 Transcriptomics_Normal_vs_Carcinoma_vs_Metastasis [msb201244-sup-0010.xls]
Supplementary Table 10 Proteomics vs.this study [msb201244-sup-0011.xls]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2012 EMBO and Macmillan Publishers Limited