The emergence of proteomics has led to major technological advances in mass spectrometry (MS). These advancements not only benefitted MS‐based high‐throughput proteomics but also increased the impact of mass spectrometry on the field of structural and molecular biology. Here, we review how state‐of‐the‐art MS methods, including native MS, top‐down protein sequencing, cross‐linking‐MS, and hydrogen–deuterium exchange‐MS, nowadays enable the characterization of biomolecular structures, functions, and interactions. In particular, we focus on the role of mass spectrometry in integrated structural and molecular biology investigations of biological macromolecular complexes and cellular machineries, highlighting work on CRISPR–Cas systems and eukaryotic transcription complexes.
Prelude—The coming of age of biomolecular mass spectrometry
The advent of mass spectrometry (MS) as an analytical technology dates back more than a century and was made possible by the groundbreaking work on cathode rays of the physicist J. J. Thomson (Thomson, 1897). Thomson not only discovered the electron but, together with F. W. Aston, also measured the masses of stable isotopes of elements (Thomson, 1911; Aston, 1920). To this end, Thomson employed electron impact ionization to charge the atoms and molecules. This allowed their mass‐to‐charge (m/z) ratio to be measured by monitoring their trajectories in an electric or magnetic field—mass spectrometry was born and its inventor's ingenuity is nowadays acknowledged by terming the unit for m/z the “Thomson” (Th) (Cooks & Rockwood, 1991).
MS remained in the realm of isotope physics for about half a century before it entered the domain of chemistry. MS with its potential to identify substances based on accurate mass measurements sparked initially the interest of the oil industry in its attempt to find ways to characterize the low molecular weight compounds present in crude oil. This challenge was addressed by pioneers like F. H. Field, J. L. Franklin, and F. W. McLafferty. They introduced and applied chemical ionization (Munson et al, 1964; Munson & Field, 1966; Baldwin & McLafferty, 1973) as a novel technique to bring organic molecules into the gas phase, which had previously only been feasible by electron impact ionization (Smith, 1937). Owing to both ionization techniques, the field of organic MS started to bloom, for it was now possible to determine the masses of semi‐volatile molecules up to about 500 Dalton (Da). Soon after, mass determination of not only the intact organic molecule but also its specific fragment ions was enabled by conceiving improved mass spectrometric instrumentation (Beynon et al, 1973; Paul, 1990) and new gas‐phase fragmentation techniques, such as collision‐induced dissociation (Haddon & McLafferty, 1968; Jennings, 1968; Kim et al, 1974). The related fragmentation mechanisms were elucidated by studying the unimolecular and bimolecular chemistry of organic molecular ions. These meticulous investigations aided the interpretation of the fragment ion mass spectra, eventually allowing to deduce details about a compound's molecular structure (Levsen & Schwarz, 1976). Over the years, about 250,000 unique organic molecules were characterized and catalogued in a library of reference mass spectra (Stein, 2016). This repository may be queried when seemingly unknown compounds are to be characterized by organic MS, which nowadays has become a core analytical technology in chemical, forensic, environmental, and atmospheric sciences.
More recently, MS entered the field of cellular, molecular, and structural biology. The dawn of biomolecular MS can be traced to the introduction of matrix‐assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI) in the 1980s, for which Koichi Tanaka and John Fenn, respectively, received the 2002 Nobel Prize in Chemistry (Tanaka et al, 1988; Fenn et al, 1989). These ionization technologies, which will be further explained later on, enabled the efficient transfer of large intact biomolecules (e.g. peptides and proteins) into the gas phase as charged ions—an essential prerequisite for MS analysis. Previously developed ionization methods, by contrast, were mostly limited to small, volatile compounds.
Only a few years after the MS analysis of peptides and proteins became feasible, first attempts were made to couple it with advanced separation techniques, such as miniaturized nanoflow liquid chromatography and electrophoresis (Henzel et al, 1993; Strupat et al, 1994; McCormack et al, 1997; Washburn et al, 2001). This development enhanced the throughput and dynamic range in the analysis of very complex peptide samples like those obtained from a proteolytic digest of a single protein, a protein mixture, or even all proteins within a whole‐cell lysate. Nowadays, hundreds of thousands of peptides can be analyzed by combining orthogonal separation methods (e.g., reversed‐phase, ion exchange and/or hydrophilic interaction chromatography) with modern mass spectrometers that perform high‐speed tandem‐MS experiments (see below for more details; Hebert et al, 2014). Interpreting these large amounts of data, especially deriving the peptide sequence from the acquired mass spectra and mapping it back onto the original proteins, posed a first major challenge but is meanwhile facilitated by sophisticated bioinformatics and statistics solutions (Cappadona et al, 2012; Bruce et al, 2013). Collectively, advanced peptide separation, rapid tandem‐MS measurements, and elaborate bioinformatics analysis form the current workflow for high‐throughput bottom‐up proteomics (Zhang et al, 2013). This technique has become the standard for the system‐wide analysis of all proteins present in a particular cell, body fluid, tissue, or organism (as reviewed by Cox & Mann, 2011; Bensimon et al, 2012; Altelaar et al, 2013; Richards et al, 2015). Recently, the progress of bottom‐up proteomics has cumulated in first drafts of the human proteome (Kim et al, 2014; Wilhelm et al, 2014).
Bottom‐up proteomics as a key driver for technological development has benefitted the field of biomolecular MS in general. First and foremost, mass analyzers have become immensely more advanced, gaining performance in speed, sensitivity, selectivity, robustness, and, equally important, user‐friendliness for non‐specialists (Ens & Standing, 2005; Zubarev & Makarov, 2013). Next, advances in sample preparation and fractionation enabled more comprehensive and reproducible MS experiments (Tabb, 2013). Novel strategies for affinity purification (Li et al, 2015), protein digestion (Tsiatsiani & Heck, 2015), and peptide fragmentation (Madsen et al, 2010; Frese et al, 2012) may serve as illustrative examples here. Finally, more efficient and reliable analysis software (Cox & Mann, 2008; Bruce et al, 2013; Lee et al, 2015), public data repositories (Perez‐Riverol et al, 2015), and tools to define and study protein interaction and signaling networks (Gavin et al, 2002; Hein et al, 2015; Huttlin et al, 2015) allow translating MS data into sound and meaningful biological information. These advancements significantly promoted the impact of MS on molecular and structural biology, a development that can be retraced using hydrogen–deuterium exchange (HDX)‐MS of biomolecules as an example. HDX‐MS has been around for about 25 years (Katta & Chait, 1991; Englander, 2006), arguably predating the advent of proteomics. However, due to limitations in hardware and analysis software, early HDX‐MS studies were mostly limited to peptides, small proteins, or protein domains. Owing to the technological progress, HDX‐MS can now be applied to study whole protein assemblies up to intact viruses (Wang et al, 2001; Lanman et al, 2004; Konermann et al, 2011; Bereszczak et al, 2013; Pirrone et al, 2015), making it a valuable tool for protein structure analysis, as we will discuss later on.
Today, MS methods are utilized in an increasing number of molecular and structural biology studies that aim for a better understanding of large biomolecular assemblies. A selection of these studies is presented in Fig 1, illustrating that biomolecular complexes from all cellular compartments and different kingdoms of life have been successfully probed by MS. In the course of this review, we will refer to several of the shown examples, as they mark the current frontiers of MS‐based structural and molecular biology. Fundamentally, these approaches can be divided into peptide‐centric and protein‐centric strategies (Fig 2). Peptide‐centric strategies comprise surface labeling approaches, such as HDX‐MS and covalent labeling‐MS, as well as cross‐linking‐MS and limited proteolysis‐MS. All of these methods probe biomolecular structures in solution, utilizing the previously described bottom‐up proteomics workflow to detect, at the peptide level, the results of the in‐solution experiment (Fig 2, bottom). Protein‐centric strategies, on the other hand, enable the mass spectrometric characterization of intact proteins and biomolecular assemblies, which can be additionally manipulated in the gas phase (Fig 2, top). Analytical targets of protein‐centric MS range from small intact histones (10–20 kDa) over antibodies (150 kDa) to complex cellular machineries (500–20,000 kDa) such as proteasomes, ribosomes, or even full virus assemblies (Marcoux & Robinson, 2013; Snijder & Heck, 2014).
The applications of biomolecular MS (summarized in Fig 2) illustrate that a mass spectrometer may nowadays be considered a full‐fledged “biochemical laboratory”, in analogy to a claim made 35 years ago that MS has become a “complete chemical laboratory” (Porter et al, 1981). In this review, we aim to familiarize non‐mass spectrometrists with the inner workings and analytical opportunities of this MS‐based “biochemical laboratory”. To this end, we will first introduce a few basic principles of biomolecular MS. Next, we will discuss the fundamental utilization of peptide‐centric MS approaches as well as protein‐centric MS under either denaturing or native‐like conditions, highlighting selected applications to examine protein structures, functions, and interactions. Finally, we will exemplify how peptide‐ and protein‐centric MS strategies have been employed to answer central biological questions, often in hyphenation with more conventional molecular and structural biology methods, notably X‐ray crystallography, NMR spectroscopy, and (cryo‐)electron microscopy (EM). In particular, recent contributions of MS‐based biochemical and structural analysis of the eukaryotic transcription machinery and prokaryotic CRISPR–Cas complexes will be highlighted.
How to weigh a protein—basic principles of biomolecular MS
MS can determine compound masses with an accuracy and precision that is unprecedented by any other analytical technique. Over the last 30 years, numerous strategies and applications of biomolecular MS have emerged and, concomitantly, many different types of mass spectrometers have been designed. Therefore, a generic instrumental platform for biomolecular MS does not exist. Most modern mass spectrometers, however, comprise a number of basic components (schematically depicted in Fig 3A) that are essential for the majority of biomolecular MS workflows. Initially, the analytes are separated from their carrier medium, usually an aqueous or organic phase, by transforming them into gaseous ions in the ion source. After ionization, the analytes, now carrying a charge, are transmitted into the high vacuum regions of the mass spectrometer, typically including a low‐resolution mass analyzer and a collision cell (Fig 3A). On their way to the detector, the analytes pass through several electric and/or magnetic fields, which allow them to be mass‐selected, activated, and separated from the remaining neutrals. At the final stage, the analyte mass‐to‐charge ratios are accurately and precisely measured by monitoring their motion through a high‐resolution mass analyzer. In short, the most essential steps of a MS experiment are analyte ionization, mass determination, and selective manipulation. In this section, we provide a concise overview of these steps, focusing on principles and techniques that are commonly used in molecular and structural biology‐related MS studies.
The ionization process must preserve the integrity of the analyte, which was long thought to be impossible for larger biomolecules because of their relatively labile peptide or sugar‐phosphate backbone. The ionization methods MALDI and ESI both overcame this obstacle but follow fundamentally different mechanisms (Fig 3B). In MALDI, biomolecules are immersed in a crystalline matrix of dye‐like molecules on a metal plate, wherefrom a laser induces the ionization and desorption of the analyte. In ESI, the analyte is dissolved in a volatile aqueous phase and inserted into a small conducting capillary. The analyte ionizes through the ultimate desolvation of miniscule charged droplets, which are created by applying an electrical current. Importantly, biomolecules become multiply charged during the ESI process, whereas mainly singly charged ions are generated during the MALDI process. MALDI has made substantial contributions to the emergence of biomolecular MS and still has its unique application areas, for example, MALDI spatial imaging of biomolecules in tissue and cells (Amstalden van Hove et al, 2010; Schwamborn & Caprioli, 2010). However, ESI has become the method of choice for most biomolecular MS applications. This preference likely stems from the use of an aqueous phase for ESI sample preparation, which can be more easily coupled to chromatographic separation techniques. Moreover, advanced ESI technologies require minimal sample consumption (at nl/min rates), surpassing MALDI in terms of sensitivity and efficiency. ESI‐based MS approaches are thus the focal point of this review; however, most of the herein discussed principles are also applicable to MALDI‐MS.
One major criterion to distinguish mass analyzers is their mass resolving power. There are various types of quadrupole and ion‐trap low‐resolution mass analyzers that are mostly used for either targeted MS strategies (which will not be discussed herein, see Picotti & Aebersold, 2012), or for specific ion manipulation in tandem‐MS experiments (ion activation and/or mass selection, see next subsection). However, accurate mass analysis of larger proteins and peptides is predominantly achieved using three different types of high‐resolution mass analyzers: time‐of‐flight (TOF) tubes, Fourier transform ion cyclotron resonance (FTICR) traps, and Orbitraps (Marshall & Hendrickson, 2008). Although the term “high‐resolution” mass analyzer is defined somewhat arbitrarily, the TOF, Orbitrap, and FTICR analyzers share one crucial feature—peptides of a typical protein digest, which usually comprise 5–50 amino acids and carry 2–7 charges, can be isotopically resolved, that is, mass separated into the different peptide isotopes. The existence of peptide isotopes is attributable to the natural occurrence of stable elemental isotopes, primarily 13C and 15N. Two adjacent peptide isotopes differ by 1 Da or, translated to the m/z scale, by 1 Th for a singly charged peptide, by 0.5 Th for a doubly charged peptide, etc. The regular isotope spacing, thus, readily allows determination of the peptide charge state, required for the calculation of its molecular weight.
The three high‐resolution mass analyzers employ quite different principles of mass measurement. In TOF MS (Ens & Standing, 2005; Marshall & Hendrickson, 2008), ions travel a fixed distance within the TOF tube. The “flight time” of the ions can be used as a direct measure to determine their m/z. In FTICR MS (Marshall & Hendrickson, 2008), ions are trapped in a magnetic field and excited by an electric field oscillating at radio frequency. After removal of the excitation field, the ions rotate at an m/z‐dependent frequency. The rotational motion is detected by an electrode pair (forming the boundaries of the ICR cell) as an image current. The time‐dependent image current can be transformed from the time domain to the m/z domain using Fourier transformation. Finally, Orbitrap MS applies principles that are related to FTICR, but Orbitraps do not require magnetic fields, which greatly simplifies instrument handling and maintenance (Zubarev & Makarov, 2013). Here, ions are trapped in an electric field generated between an outer barrel‐like electrode and an inner spindle‐like electrode. The ions oscillate along the inner electrode, again generating an image current. The oscillations are recorded and their m/z‐dependent frequency is used to retrieve the m/z through Fourier transformation.
Gas‐phase fragmentation techniques
Often, the m/z is not only determined for the intact analyte ion (MS1 level) but also for its fragment ions (MS2 level), which are generated in so‐called tandem‐MS experiments. Here, an intact analyte ion species of a specific mass is selected and isolated in the low‐resolution mass analyzer and subsequently activated and fragmented. Gas‐phase fragmentation (or activation) provides deeper insights into the analyte structure, for example, the amino acid sequence of a peptide. The analyte activation takes place either within the mass analyzer (typical for ion traps and FTICR cells) or in a collision cell situated between the low‐resolution mass analyzer and the high‐resolution mass analyzer (Fig 3A). Activation of peptides and denatured proteins typically leads to specific peptide backbone cleavage. Depending on which chemical bond within the peptide backbone is cleaved, this process generates a‐/x‐, b‐/y‐ or c‐/z‐fragment ion series (Fig 3C). By definition, a–c describe N‐terminal fragment ions and x–z describe C‐terminal fragment ions (Roepstorff & Fohlman, 1984; Biemann, 1990; Steen & Mann, 2004). Series of these fragment ions detected at the MS2 level readily reveal the amino acid sequence of the analyzed peptide ion. If not peptides or denatured proteins, but native protein assemblies are subjected to gas‐phase activation, efficient backbone fragmentation is generally not observed. Instead, activation may lead to (partial) unfolding and/or dissociation of non‐covalently associated interaction partners. The release of these interaction partners and their mass measurement may provide insights into the quaternary structure of biomolecular complexes. Non‐proteinogenic biomolecules, such as carbohydrates and nucleic acids, undergo their own characteristic fragmentation reactions, for which specific rules and nomenclatures have been established as well (Domon & Costello, 1988; Flora & Muddiman, 1998).
A diverse array of activation/fragmentation techniques is nowadays available. We confine this subsection to collision‐induced dissociation (CID), higher‐energy collisional dissociation (HCD, sometimes referred to as beam‐type CID), and electron transfer dissociation (ETD), which represent some of the most popular ion activation strategies in biomolecular MS. For more information on alternative approaches, please refer to Bogdanov and Smith (2005); Brodbelt (2014); Zhou and Wysocki (2014); Zubarev (2004). Both CID and HCD depend on successive collisions between the analyte ions and inert gas molecules, for example, nitrogen, argon, or xenon, as they are accelerated in the collision cell. For peptide ions, CID‐ as well as HCD‐based activation mainly gives rise to b‐ and y‐type fragment ions in the MS2 spectra. CID, in particular, forces breakage of the analyte's most labile chemical bond. Applying CID on a peptide bearing a labile post‐translational modification (PTM) such as a phosphorylation or glycosylation therefore predominantly causes loss of the PTM rather than peptide backbone cleavage. This phenomenon is less prominent when HCD is used because it applies higher activation energies than CID. In ETD, analyte fragmentation is achieved through an electron transfer reaction from a radical anion toward the positively charged peptide (McLuckey & Stephenson, 1998; Syka et al, 2004). Due to this prompt reaction mechanism, ETD‐based fragmentation is specifically targeted to the peptide backbone, generating mostly c‐ and z‐type fragment ions, whereas PTMs remain bound to the peptide. A similar fragmentation pattern is produced by electron capture dissociation (ECD), which is specifically used in FTICR MS. As the ETD/ECD fragmentation pattern is complementary to CID/HCD, hybrid fragmentation approaches, such as ETciD and EThcD, can substantially improve the peptide sequence coverage (Swaney et al, 2007; Frese et al, 2012).
Molecular biology meets mass spectrometry I—peptide‐centric MS methods
The vast capabilities of peptide‐centric bottom‐up MS in identifying proteins and localizing amino acid modifications make it an ideal readout not only for standard proteomics workflows but also for approaches probing protein structures, conformations, and interactions by chemical or enzymatic in‐solution modification. The most prominent examples of such approaches are chemical cross‐linking, covalent and/or non‐covalent surface labeling, and limited enzymatic proteolysis (Fig 2). All of these methods were introduced several decades ago, but their combination with bottom‐up MS has substantially increased their impact on the field of structural biology. In the following subsections, we describe this development in more detail.
Chemical cross‐linking combined with bottom‐up analysis (cross‐linking‐MS) can be used to study protein conformations as well as protein–protein and protein–nucleic acid interactions. This review will largely focus on protein–protein cross‐linking, and a thorough overview of protein–nucleic acid cross‐linking‐MS applications has been published elsewhere (Schmidt et al, 2012).
The fact that chemical cross‐linking can capture intermolecular interactions of proteins in solution is known for at least 70 years (Fraenkel‐Conrat & Olcott, 1946). The utilization of cross‐linking for the structural probing of biomolecular systems was initially facilitated by the emergence of gel electrophoresis in the 1970s, as exemplified by topological studies on ribosomal protein complexes (Clegg & Hayes, 1974; Sun et al, 1974). Concomitantly, a wide array of cross‐linking reagents has been explored and current cross‐linking studies still largely rely on the same chemical principles (Sinz, 2003). In general, cross‐linking reagents comprise a spacer arm of varying length connecting two functional groups, which are reactive toward specific amino acid residues. Presently, the most popular class of cross‐linking reagents contains two N‐hydroxysuccinimide ester functionalities that specifically target primary amino groups, that is, protein N‐termini and lysine side chains. Other commercially available cross‐linking reagents are reactive toward thiol groups (i.e., Cys side chains) and carboxyl groups (i.e., Asp and Glu side chains) or contain non‐specifically reactive photoactivatable moieties (Sinz, 2003; Petrotchenko & Borchers, 2010).
Figure 4 illustrates the experimental output produced by cross‐linking‐MS and other biomolecular MS experiments, to which we will recur in the following subsections. The respective data are exemplified using the simple model system human hemoglobin: a tetrameric protein complex consisting of two α‐chains, two β‐chains, and four non‐covalently bound heme groups (Fig 4A). Figure 4B shows a MS2 spectrum, acquired after mass selection and gas‐phase fragmentation of two cross‐linked hemoglobin peptides. The MS2 signals can be clearly annotated to fragment ions representing gas‐phase fragmentation events at specific peptide backbone positions. Thus, MS allows sequencing of both peptides and localization of the cross‐linker‐modified residues. The conception of MS‐based cross‐link detection (Rappsilber et al, 2000; Young et al, 2000) has boosted the amount and quality of information that can be gained from a cross‐linking experiment. Notably, two residues will only be cross‐linked if their mutual distance can be bridged by the applied cross‐linking reagent. Cross‐links therefore impose distance constraints on the studied system, revealing binding interfaces (by locating inter‐protein cross‐links) and details about protein conformations (by locating intra‐protein cross‐links). Moreover, the abundance of cross‐links under different experimental conditions can be relatively quantified, enabling comparisons between various biological states in respect of existing protein conformations and interactions (Fischer et al, 2013; Schmidt et al, 2013; Walzthoeni et al, 2015). Cross‐linking data can be directly employed to guide computational protein homology modeling and protein–protein docking (Herzog et al, 2012; Kalisman et al, 2012; Lössl et al, 2014; Zeng‐Elmore et al, 2014). Alternatively, they may be used to complement information from other structural biology approaches, most notably cryo‐EM, thereby revealing the architecture of multi‐protein complexes such as the nuclear pore complex scaffold (Bui et al, 2013), the 55S mitochondrial ribosome (Greber et al, 2015) or the INO80 chromatin remodeler complex (Tosi et al, 2013) (see also Fig 1). Of note, cross‐linking data of protein homo‐oligomers present a special case since links within and between the subunits cannot be readily distinguished. In most cases, the thorough annotation of homo‐oligomeric cross‐links critically depends on the availability of high‐resolution three‐dimensional protein structures (Kosinski et al, 2015).
Cross‐linking‐MS allows the unbiased structural probing of systems with—in theory—unlimited size and complexity, including cellular protein networks. To live up to this promise, however, cross‐linking‐MS had (and has) to master several obstacles concerning sample preparation and analysis, cross‐link identification, and data interpretation. Here, we will focus on two major advances in these areas that have made cross‐linking‐MS a more powerful and versatile tool for molecular and structural biology research.
First, chemical cross‐linking reactions generally proceed with low efficiency, thus generating a low amount of cross‐linked protein relative to unreacted protein (Leitner et al, 2010). Beneficially, this largely prevents the detection of random protein contacts; however, it may also cause loss of information about specific interactions that are either short‐lived or involve less abundant proteins. This especially becomes a limiting factor when cross‐links are to be identified in complex mixtures, for example, whole‐cell lysates, wherein abundant proteins can “mask” low abundant cross‐linked interactors. The key to a more sensitive detection of these less prevalent protein interactions is the enrichment of cross‐links at either the protein or peptide level. Cross‐link enrichment was first achieved using cross‐linking reagents that harbor a biotin tag to facilitate affinity purification (Alley et al, 2000; Trester‐Zedlitz et al, 2003). More recently, biotin‐labeled cleavable cross‐linkers, so‐called protein interaction reporters, have been introduced (the general utility of cleavable cross‐linkers is discussed in the next paragraph). These cross‐linkers were applied to enrich and identify cross‐linked peptides from intact bacterial and human cells (Zheng et al, 2011; Chavez et al, 2013; Weisbrod et al, 2013; Navare et al, 2015). Similarly complex samples were probed by using a non‐cleavable cross‐linker with a removable biotin label (Tan et al, 2016) as well as cleavable cross‐linkers that can be biotinylated after the cross‐linking reaction (Kaake et al, 2014) or enriched by two‐dimensional strong cation exchange chromatography (Buncherd et al, 2014). Complementary to these cross‐linker‐based enrichment strategies, several cross‐linking‐MS studies employed affinity‐tagged proteins, enabling structural studies centered on a few proteins of interest. Based on such targeted approaches, detailed in vivo protein interaction networks of the 26S proteasome and the protein phosphatase 2A could be derived (Guerrero et al, 2006; Herzog et al, 2012). Recently, the structural organization of several endogenous protein complexes was examined by cross‐linking‐MS after affinity purification from transgenic GFP‐tagged yeast strains and mice (Shi et al, 2015). Owing to the vast repository of these transgenic strains and organisms, this approach potentially opens an avenue to investigate virtually any protein system by in vivo cross‐linking combined with protein‐based affinity purification.
Second, cross‐linking‐MS had to devise efficient search engines for the identification of cross‐linked peptides against very large peptide sequence databases. This presents a major challenge because all possible pairwise combinations of peptides need to be considered during the cross‐link search. Searching for cross‐linked peptides against a database generated from tens of proteins therefore is computationally as demanding as identifying linear peptides against the full human proteome (Liu et al, 2014). The identification of cross‐links from more complex samples was initially facilitated by the search engine xQuest (Rinner et al, 2008), relying on the use of isotope‐coded cross‐linkers that “label” the cross‐linked peptides in the mass spectra in order to decrease the search space. This, together with dedicated bioinformatics approaches, enabled cross‐link identification against the full E. coli proteome (Rinner et al, 2008), which was later also achieved with the pLink search engine (Yang et al, 2012). Nowadays, xQuest and pLink have become the most widely used cross‐link search engines to characterize the architecture of large protein complexes (Bui et al, 2013; Tosi et al, 2013; Cevher et al, 2014; Erzberger et al, 2014; Greber et al, 2014, 2015; Han et al, 2014c; Knutson et al, 2014; Shi et al, 2014).
A more fundamental approach to make cross‐link identification more efficient focuses on cross‐linking reagents, which are cleaved during the MS experiments (Soderblom & Goshe, 2006; Müller et al, 2010). Utilization of such MS‐cleavable cross‐linkers enables individual MS analysis of linear peptides, theoretically removing all limitations regarding the sample complexity. Consequently, MS‐cleavable cross‐linkers, which additionally contained an affinity tag, were applied in many of the above‐mentioned cross‐linking studies on intact cells (Zheng et al, 2011; Chavez et al, 2013; Weisbrod et al, 2013; Kaake et al, 2014; Navare et al, 2015). These studies employed sophisticated multi‐dimensional chromatography and MS setups in combination with software tools tailor‐made for the respective cross‐linker. To simplify such advanced in vivo cross‐linking experiments, we recently introduced the search engine XlinkX, the algorithm of which is compatible with any MS‐cleavable cross‐linker and with standard Orbitrap MS‐based data acquisition workflows. In a proof of concept study, XlinkX was able to identify more than 2000 unique cross‐links against a full human proteome database providing new insights into the interaction of the 80S ribosome with several associated proteins as they occur in the cellular environment (Liu et al, 2015).
Surface labeling is based on the principle that chemical probes preferably modify a biomolecule at its solvent‐exposed parts, while amino acids that are buried, either in the folded protein core or by an interacting protein, will not be affected. Any modification causes a defined mass shift, which can be detected at the peptide level by bottom‐up MS analysis. Evidently, the information obtained from surface labeling is somewhat similar to the insights gained from cross‐linking experiments. Cross‐linking, however, readily identifies the interacting proteins, whereas surface labeling merely highlights protected regions. Translating surface labeling data into valuable structural information often requires probing of different protein states, for example, unfolded/folded, monomeric/oligomeric, or unbound/ligand‐bound, in comparative or time‐course labeling experiments. Such an analysis reveals which protein areas become solvent‐exposed or buried and which regions remain unaffected, giving insights into biomolecular conformations and binding interfaces. Surface labeling can be performed in two different ways, that is, covalent labeling by amino acid side chain modifications and non‐covalent labeling by hydrogen–deuterium exchange at the peptide backbone, better known as HDX‐MS.
Covalent labeling has developed much in parallel with chemical cross‐linking because the employed chemical principles partially overlap, as exemplified by the use of lysine‐reactive and photoactivatable reagents (Klapper & Klotz, 1972; Bayley & Knowles, 1977). However, basically any covalent labeling strategy is compatible with bottom‐up MS. Therefore, covalent labeling‐MS can be tailored to target a wide array of amino acids (Mendoza & Vachet, 2009). Alternatively, it may be conducted in a non‐specific fashion, for example, by photo‐affinity labeling (Robinette et al, 2006) or oxidation with hydroxyl radicals (Maleknia & Downard, 2014). Hydroxyl radical probes are, for example, employed in a labeling approach named “fast photochemical oxidation of proteins” (FPOP) (Aye et al, 2005; Hambly & Gross, 2005). FPOP has been successfully used to monitor protein folding (Chen et al, 2010a), to map binding epitopes (Yan et al, 2014), and even to probe solvent‐accessible protein regions in vivo while preserving the cellular integrity (Espino et al, 2015). Beneficially, FPOP seems to work equally efficient on proteins located in the cell membrane, cytoplasm, and nucleus.
The non‐covalent labeling approach HDX is the best‐known and most widely used strategy for biomolecular surface mapping. Hydrogen exchange as a means to probe biomolecular structures was already introduced in the late 1960s (Englander et al, 1972). It is based on the observation that hydrogens bound to N, O, or S exchange against solvent hydrogens, whereby peptide backbone hydrogens exchange with specific measurable rates while amino acid side chain hydrogens exchange too fast to be monitored (Englander et al, 1972). Soon after the emergence of biomolecular MS, HDX and MS were combined in pioneering studies, showing that this integrated approach gives insights into the solution and gas‐phase conformation of proteins (Katta & Chait, 1993; Suckau et al, 1993). Most commonly, an HDX‐MS analysis consists of three essential steps. First, a biomolecular assembly is transferred from H2O‐ to D2O‐based buffers so that the solvent‐exposed hydrogens will exchange to deuterium (see also Fig 2). Second, this exchange reaction is quenched at different time points by acidification and cooling of the solvent. Third, the reaction mix is subjected to bottom‐up MS analysis to determine the deuterium uptake at the peptide level, allowing to derive HDX reaction kinetics that reveal solvent‐accessible regions of the analyte. In Fig 4C, this is again illustrated for human hemoglobin. The mass spectra show peptides that have the same amino acid sequence but were derived from either holo‐hemoglobin (left panel) or the free hemoglobin α‐chain (right panel). Both samples were incubated in D2O‐based buffer and analyzed at three different time points. The isotope envelope of the free α‐chain peptide moves continuously to higher m/z positions (right panel), evidencing that more and more hydrogens are replaced by heavy deuterium isotopes. In contrast, the holo‐hemoglobin peptide remains at the same m/z position over time (left panel), showing that no deuterium is taken up. This suggests that the analyzed peptide is solvent accessible in the free α‐chain, but buried in holo‐hemoglobin.
A main shortcoming of the standard HDX‐MS workflow is that the bottom‐up analysis is performed in H2O‐based solvents, so the deuterium uptake may be partially mitigated by deuterium‐to‐hydrogen back exchange. In view of this obstacle, it is clear that increasing speed and efficiency of the bottom‐up analysis has been a key to make HDX‐MS a broadly applicable strategy (Konermann et al, 2011; Walters et al, 2012). This has been achieved by automating large parts of the HDX‐MS analysis workflow. In a state‐of‐the‐art HDX‐MS experiment, the quenched reaction mix is subjected to on‐column digestion with an immobilized acid‐stable protease (mostly pepsin), peptides are separated using a miniaturized reversed‐phase liquid chromatography system, and mass analysis is performed on a tandem‐MS platform, all of which are often coupled online. Further advances—like online reduction of protein disulfide bonds prior to digestion (Trabjerg et al, 2015), ETD‐based peptide fragmentation to monitor residue‐specific deuterium uptake (Zehl et al, 2008), and sophisticated software tools for peptide mapping, data analysis, and structural interpretation (Slysz et al, 2009; Kan et al, 2011; Pascal et al, 2012; Rey et al, 2014)—have additionally streamlined the HDX‐MS workflow.
Nowadays, HDX‐MS is applicable to biomolecular systems that are hardly tractable by other techniques. For example, HDX‐MS has become increasingly popular to study the structural biology of membrane proteins, shedding light on interactions of membrane receptors (Chung et al, 2011; Shukla et al, 2014), conformational dynamics of membrane proteins (Vahidi et al, 2016), or protein–membrane interactions (Rostislavleva et al, 2015). Moreover, HDX‐MS is a particularly powerful approach to study the interactions of chaperones and their folding substrates. Such investigations are complicated by the fact that specific and non‐specific chaperone–substrate interactions and intra‐substrate interactions, all of which change dynamically during the folding process, have to be discerned. This complex interplay of events is exemplified by the interaction between the histone H2A‐H2B heterodimer and its chaperone Nap1, recently studied by HDX‐MS (D'Arcy et al, 2013). In this study, Nap1 was shown to confine the conformational flexibility of H2A‐H2B, transforming their partially disordered histone fold domains into a more folded conformation. At the same time, the Nap1/H2A‐H2B binding interface was mapped (shown in Fig 1) and, concomitant with protein complex formation, distinct cooperative unfolding events within both Nap1 and H2A‐H2B could be revealed. Another recent HDX‐MS investigation focused on chaperone‐dependent differences in the folding pathway of the TIM‐barrel protein DapA (Georgescauld et al, 2014). The authors demonstrated that DapA folds in a slow cooperative manner in the absence of the bacterial chaperone complex GroEL/GroES. When present, GroEL/GroES catalyzes the separate folding of individual DapA segments, thereby accelerating the folding process without specifically interacting with DapA. Intriguingly, a DapA homolog protein from a bacterium that lacks GroEL/GroES folds in a fast, segmental way even without the chaperone complex. These findings led to the hypothesis that segmental folding may be a general pathway for TIM‐barrel proteins, providing an evolutionary route toward GroEL/GroES independence.
Limited proteolysis, introduced more than 60 years ago (Linderstrøm‐Lang, 1950), is one of the most established biochemical approaches to study the higher order structure of biomolecules (Hubbard, 1998). It is based on the concept that proteolysis of a folded protein is not only dependent on the amino acid sequence but also on the tertiary structure, with surface‐exposed and flexible regions being the most proteolytically susceptible sites (Fontana et al, 1986). Limited proteolysis is therefore a popular method to separate stably folded protein domains from flexible regions, for example, to generate X‐ray crystallography‐compatible protein constructs (Hubbard, 1998).
The use of limited proteolysis in conjunction with MS was pioneered by Brian Chait and co‐workers (Cohen et al, 1995). Subsequently, this approach gained increasing popularity, enabling protein–protein binding site mapping (Gervasoni et al, 1998) or even monitoring of virus capsid conformational dynamics (Bothner et al, 1998). Over the past decade, the application of limited proteolysis‐MS has been somewhat overshadowed by HDX‐MS and cross‐linking‐MS, yet it was recently brought back into the spotlight by an elegant proteome‐wide study on protein structural transitions using limited proteolysis and targeted MS (Feng et al, 2014). The authors performed metabolic shift experiments on yeast cells, changing the carbon source from glucose to ethanol, and globally monitored the condition‐dependent abundance of limited proteolysis products. Changes in the limited proteolysis patterns may reveal altered protein conformations, differing protein PTM profiles or protein–ligand binding events. Crucially, limited proteolysis was performed under native conditions using proteinase K and, subsequently, the sample was further proteolyzed under denaturing conditions using trypsin. This step ensured the bottom‐up MS compatibility of this highly complex sample while still allowing to distinguish between normal tryptic peptides (cleaved C‐terminally of lysines and arginines) and limited proteolysis products (with at least one proteinase K cleavage site). Comparative analysis of the limited proteolysis products and their abundances revealed metabolic shift‐induced conformational changes in several enzymes involved in carbon metabolism.
Complementary to limited proteolysis, structural aspects of proteins can also be probed by the proteome‐wide monitoring of thermal protein denaturation (Savitski et al, 2014; Reinhard et al, 2015). This approach is based on the rationale that structural changes, in particular ligand association, will affect the protein stability and thus its denaturation temperature. The thermal denaturation and the limited proteolysis approach represent another two examples where the combination of peptide‐centric MS with in‐solution structural probing enables the system‐wide examination of protein structures, conformations, and binding events. Being conducted on intact cells or cell lysates, these experiments are particularly suitable to foster our understanding of in vivo biomolecular interactions, thus gradually unraveling the cellular interactome.
Molecular biology meets mass spectrometry II—protein‐centric MS methods
Although peptide‐centric MS is nowadays the most popular variant of biomolecular MS, the initial seminal work introducing ESI and MALDI focused largely on intact proteins with masses up to 130 kDa (Karas & Hillenkamp, 1988; Tanaka et al, 1988; Fenn et al, 1989). Shortly afterward, it was demonstrated that, especially by using ESI, also non‐covalent assemblies could be preserved in the gas phase and analyzed intact by mass spectrometry (Ganem et al, 1991; Katta & Chait, 1991; Light‐Wahl et al, 1993; Schwartz et al, 1994; Fitzgerald et al, 1996). This finding gave rise to a new field that was later called “native mass spectrometry” (van den Heuvel & Heck, 2004). In native MS, named in analogy to native gel electrophoresis, purified proteins or biomolecular assemblies are injected into the mass spectrometer under non‐denaturing conditions. While the aforementioned early experiments were mainly performed on protein–ligand complexes and protein homo‐oligomers, careful tuning of the mass spectrometer and the ionization conditions lead to successful analysis of more and more non‐covalent biomolecular complexes, including impressive examples such as the ribosome (Rostom et al, 2000), intact viruses (Fuerstenau et al, 2001; Uetrecht et al, 2008), the substrate‐loaded GroEL/ES chaperone complex (van Duijn et al, 2005), and endogenously produced eukaryotic exosome complexes (Hernández et al, 2006; Synowsky et al, 2006) (see also Fig 1). Most of these studies used ESI, ionizing the proteins from an aqueous ammonium acetate buffer. Beneficially, these solutions are volatile, minimizing the formation of biomolecule–buffer molecule adducts, which would impair accurate mass measurements. Ammonium acetate has only moderate buffer capacity at physiological pH, but can keep biomolecules in a native‐like functionally active state. The retention of native‐like biomolecular tertiary structures in the gas phase could eventually be evidenced by probing the gas‐phase structure of non‐denatured proteins and protein complexes using ion mobility spectrometry‐MS (IMS‐MS) (Ruotolo et al, 2005; van Duijn et al, 2009).
IMS‐MS has been a niche technology for several years, mainly applied to study the gas‐phase conformations of metal clusters, peptides, and small proteins (Clemmer & Jarrold, 1997). This changed rapidly with the introduction of a commercially available IMS‐MS instrument in 2007 (Pringle et al, 2007). In the last ten years, IMS‐MS has become a popular technique to study the overall structure of biomolecular assemblies, monitor changing protein conformations, or reduce sample complexity by complementing the mass measurement with another gas‐phase separation dimension. All of these aspects have been described in several reviews (Clemmer & Jarrold, 1997; Uetrecht et al, 2010b; Niu et al, 2013; Lanucara et al, 2014) and protocols (Ruotolo et al, 2008). Therefore, we will provide only a brief explanation of the IMS principle. In essence, IMS separates gaseous ions based on their mobility through a buffer gas. This may allow the detection of different conformers of the same protein (Clemmer et al, 1995) or the separation of different oligomeric states of amyloidogenic proteins (Woods et al, 2013), which may exhibit the same m/z values but different ion mobilities. The ion mobility depends on a number of aspects including ion charge and size. Most importantly, ion mobility determination allows calculating rotationally averaged collisional cross‐sections of the analyte ions, which renders information on their shape. Changes in ion mobility detected by IMS, thus, indicate structural changes. Figure 1 shows two examples where such information was used to monitor ligand‐induced conformational changes of an ABC transporter (Marcoux et al, 2013) and sequential steps of norovirus capsid assembly (Uetrecht et al, 2010a).
Currently, the majority of the native (IMS‐)MS experiments is performed on soluble proteins and protein complexes, as exemplified by studies on megadalton‐sized assemblies like cargo‐loaded bacterial nano‐containers (Rurup et al, 2014) and the dynein cofactor dynactin (Urnavicius et al, 2015), the structure of which is shown in Fig 1. However, native MS may also be used to investigate membrane proteins, which can be directly infused after solubilization either in detergent micelles (Barrera et al, 2008) or by detergent‐free methods (Hopper et al, 2013). Besides highly accurate mass measurements of biomolecular assemblies, native MS experiments can provide valuable information on stoichiometries and oligomeric states, especially when combined with gas‐phase dissociation in native tandem‐MS experiments and/or with complementary MS experiments under denaturing conditions. This is illustrated in Fig 4D wherein mass spectra of denatured and native hemoglobin are depicted. In the upper mass spectrum, hemoglobin was measured by ESI‐MS under acidic denaturing conditions, disrupting all non‐covalent interactions. Accordingly, the heme cofactor has dissociated from the α‐ and β‐chain, as evidenced by a signal corresponding to free heme (Fig 4D, inset in upper mass spectrum). Moreover, the α‐ and β‐protein chains are unfolded, exposing all their chargeable amino acids. This causes the emergence of highly charged α‐ and β‐chain ions during the ESI process, which are detected at relatively low m/z. Since these protein ions are substantially larger and higher charged than peptide ions, their isotope pattern is not resolved. However, their molecular weights can still be accurately determined based on the m/z differences between the differently charged α‐ and β‐chain ions (Fig 4D, inset in upper mass spectrum). The masses of the three hemoglobin components—α‐chain (15,155 ± 1 Da), β‐chain (15,895 ± 1 Da), and heme (616.5 Da)—are thus readily revealed by the denaturing MS analysis. In contrast, the native mass spectrum (Fig 4D, lower mass spectrum) contains only signals for one species, represented by three charge states in a relatively high m/z region. The shift to higher m/z is mainly due to the fact that the analyte takes up fewer charges than under denaturing conditions. Charging of the analyte is reduced in native MS because some chargeable sites are buried within the folded protein core and the pH of the spraying solution is closer to neutral. Based on the three charge states seen in the native mass spectrum, the species mass can be calculated as 64.59 ± 0.04 kDa, which unambiguously corresponds to a protein complex consisting of two α‐chains, two β‐chains, and four heme groups. Two mass spectra, each acquired in < 2 min, can thus correctly identify both the composition and binding stoichiometry of a biomolecular complex.
While the hemoglobin example illustrates (albeit for a simple case) the power of protein‐centric MS, it also highlights that one level of information is still missing—the amino acid sequence. Sequencing intact proteins directly in a so‐called top‐down approach (Kelleher et al, 1999) rather than inferring their sequence from bottom‐up MS data offers several advantages. Most importantly, top‐down sequencing may go beyond sheer protein identification, potentially revealing sequence variations, the position of PTMs, and even the interdependence among different mutations and modifications, all of which is barely feasible when using solely peptide‐centric MS methods. The success of this strategy, however, heavily depends on a highly efficient fragmentation of the peptide backbone, which is generally more difficult to achieve for intact proteins than for short peptides. It is therefore not surprising that the development of top‐down sequencing closely follows advances in peptide fragmentation techniques. While top‐down sequencing of native proteins and protein complexes is still in its infancy, first successful top‐down experiments on denatured protein samples were performed in the McLafferty laboratory around the turn of the millennium (Kelleher et al, 1999). This breakthrough was achieved on FTICR mass spectrometers mainly due to the invention of ECD fragmentation (Horn et al, 2000; Sze et al, 2002). Over the years, more fragmentation techniques such as ETD and UV photodissociation were introduced, allowing top‐down sequencing to be performed on several types of instruments, including Orbitrap mass spectrometers (Chi et al, 2007; Fornelli et al, 2012; Shaw et al, 2013). Next to the progress in the field of gas‐phase fragmentation, the development of powerful spectrum analysis software and statistical tools (Liu et al, 2012; Fellers et al, 2015; Cai et al, 2016) has greatly increased the scope of top‐down protein analysis. Finally, more efficient protein extraction, separation, and fractionation methods were critical prerequisites for an in‐depth analysis of complex samples by top‐down MS (Sharma et al, 2007; Chen et al, 2008; Tran & Doucette, 2009; Han et al, 2014b).
It becomes apparent that, by now, many different variants of protein‐centric MS have evolved, ranging from non‐denaturing approaches that may be combined with gas‐phase dissociation and ion mobility separation to denaturing approaches that even facilitate intact protein sequencing (Fig 2). Combining these approaches is, in our view, the key to a thorough understanding of biomolecular systems. Therefore, the following subsections will showcase examples that illustrate the diverse, often complementary, analytical angles provided by the different protein‐centric MS approaches.
Comprehensive analysis of post‐translational modifications
As phosphorylation is one of the most prevalent PTMs, the functional and structural characterization of protein phosphorylation is a major theme in molecular biology (Hunter, 1995; Johnson, 2009). Detection and site localization of protein phosphorylation sites using MS works without radioactive labeling and specific antibodies, in contrast to more traditional biochemical methods. Protein phosphorylation is typically analyzed with bottom‐up proteomics approaches (Riley & Coon, 2016); however, the central role of phosphorylation in modulating protein conformation, activity, localization, and complex formation/dissociation has driven the development of low‐throughput protein‐centric MS approaches that are better suited to monitor these aspects.
Owing to the ability of native MS to capture non‐covalent interactions, it appears to be a straightforward choice to monitor the effect of protein phosphorylation on biomolecular interactions. However, differentially phosphorylated protein isoforms were, for a long time, nearly impossible to distinguish due to limitations in mass resolving power. Notably, phosphorylation causes a mass shift of no more than 80 Da; therefore, phospho‐isoforms differ in mass by generally < 0.1%. Resolving these subtle differences became possible with the development of mass spectrometers that combined high resolution with a high mass range. Important early contribution in this area came from orthogonal Q‐TOF mass spectrometers, modified to allow the transmission and detection of high mass ions (Sobott et al, 2002; van den Heuvel et al, 2006). More recently, the now‐commercialized Orbitrap EMR instrument has really made an impact in this field (Rose et al, 2012; Snijder et al, 2014). This novel mass spectrometer is able to mass‐resolve phospho‐isoforms of proteins and protein complexes of several 100 kDa, as exemplified in a recent investigation of the interplay between protein phosphorylation and protein–protein or protein–ligand interaction dynamics (van de Waterbeemd et al, 2014). In this study, the phosphorylation and cyclic nucleotide binding of dimeric 150 kDa cGMP‐dependent protein kinase (PKG) were simultaneously monitored by high‐resolution native MS, showing that binding of cAMP or cGMP causes different PKG phosphorylation kinetics. In a second example, it was demonstrated that the binding and phosphorylation of the mitotic regulator Bora by the cycle kinase Aurora A proceed independently. Interestingly, all three investigated proteins—Aurora A, Bora, and PKG—existed in different phosphorylation states. The relative abundance of all these phospho‐isoforms could be accurately determined by native MS, whereas complementary peptide‐centric MS experiments were done to localize the phosphorylated residues (van de Waterbeemd et al, 2014). Since the coexisting phospho‐isoforms are indistinguishable at the peptide level, the phosphorylated residues could not be allocated to specific phosphorylation states. This level of information was later accessed by combining highly specific ion isolation and complementary gas‐phase fragmentation techniques in a top‐down protein sequencing approach (Brunner et al, 2015). Top‐down sequencing allowed to decipher the phospho‐proteoforms of Bora resulting from phosphorylation by either Aurora A or Polo‐like kinase 1 (Plk1), showing that both kinases target different Bora residues and generate distinct phosphorylation successions. Compared to these binary kinase/Bora systems, the tripartite Aurora A/Bora/Plk1 interplay is analytically even more challenging, as it is characterized by numerous mutual phosphorylation events with various implications on protein structure and function. Simultaneous probing of these often temporarily occurring effects was recently achieved by using an MS‐based structural biology strategy, integrating native MS, cross‐linking‐MS, IMS‐MS, top‐down sequencing, and bottom‐up proteomics (Lössl et al, 2016). Strikingly, it could be demonstrated that Aurora A and activated Plk1 hyper‐phosphorylate Bora according to a defined sequence of residue‐specific phosphorylations, thereby priming a substantial structural change of Bora, which eventually allows stable Plk1/Bora complex formation. This multipronged MS analysis, thus, provided mechanistic insights into the sequence of events accompanying the Aurora A/Bora‐mediated Plk1 activation, which is essential for recovery from DNA damage‐induced cell cycle arrest.
Another example that the seemingly small difference of one phosphorylation can have a profound effect on proteins was recently reported for calmodulin after phosphorylation by casein kinase 2 (Pan et al, 2016). The kinase was shown to phosphorylate calmodulin between one and four times in a specific order. To see whether any of these phosphorylation events influenced the calmodulin structure, the phosphorylated protein was incubated with deuterated buffer for a set amount of time, denatured and injected into the mass spectrometer. Subsequent selection and ETD fragmentation of the distinct phospho‐isoforms allowed the quantification of the deuterium uptake in specific regions. This innovative top‐down HDX‐MS experiment successfully proved that only a specific pair of phosphorylation events influenced the calmodulin structure substantially.
The in‐depth characterization of PTMs using protein‐centric MS is not limited to phosphorylation. An interesting example exhibiting more complex PTM patterns is presented by histone proteins bearing multiple modifications such as methylation (+14 Da), acetylation (+42 Da), and phosphorylation (+80 Da). Cross talk between these modifications has been examined for histone H3 using state‐of‐the‐art top‐down MS analysis (Zheng et al, 2016). Another protein modification that recently attracted a lot of attention in the biomolecular MS community is glycosylation. Unlike the aforementioned PTMs, glycosylation needs to be studied not only in respect of its location and abundance, but also with regard to the saccharide composition of the often very complex glycan trees. For this reason, glycosylation analysis requires a different set of MS strategies. Bottom‐up MS analysis can be conducted to examine glycosylation patterns, whereby complementary fragmentation methods, for example, ETD and HCD, allow to derive both the peptide sequence and the glycan tree composition. While bottom‐up MS reveals the specific glycan linkages as well as site‐specific glycosylation differences, these data can be effectively complemented by protein‐centric MS analysis. Both native MS and denaturing MS have been successfully used to profile complex glycosylation patterns even in the presence of other PTMs, as exemplified by the comprehensive analyses of chicken ovalbumin and interferon‐β1, which were shown to consist of 59 and 138 different protein isoforms, respectively (Yang et al, 2013; Bush et al, 2016). As such, protein‐centric MS may become a screening method to compare patient‐derived plasma glycoproteins or production batches of protein therapeutics. Most therapeutic proteins, such as monoclonal antibodies, are well characterized regarding their primary sequence and specific glycosylation sites. However, the glycan tree composition at these sites can still differ substantially, depending on the source material (e.g., CHO cells, yeast or human cell lines) and its growth conditions. A protocol to investigate these glycan trees, in particular on monoclonal antibodies, with native MS has been reported 2 years ago (Rosati et al, 2014). This approach relies on the stepwise application of glycan‐specific glycosidases to sequentially truncate the glycan trees. As a result of this procedure, certain proteoforms show glycan‐specific mass losses, which can be immediately read out with native MS, allowing the step‐by‐step reconstruction of the glycosylation profile. As an additional benefit of native MS, this strategy can also be used for glycoprotein complexes. Examples of non‐covalently associated glycoproteins characterized by native (IMS‐)MS include glycosylated antibody–antigen complexes, multimeric glycoproteins (Dyachenko et al, 2015), glycosylated antibody–drug conjugates (Rosati et al, 2013; Marcoux et al, 2015), and glycoprotein complexes involved in complement activation (Diebolder et al, 2014; Wang et al, 2016).
Protein–ligand binding kinetics and stoichiometries
Binding of ligands, such as cofactors, nucleotides, lipids, or drug molecules, is to some extent similar to PTMs, as both result in a characteristic mass shift of proteins and protein complexes. Unlike PTMs, however, ligands are in general non‐covalently associated, so their binding needs to be investigated by native MS. Native MS protein–ligand interaction studies are not limited to the mere detection of the binding event, but can also provide information about the binding stoichiometry (McCammon et al, 2004; Schuller et al, 2016), affinity (Clark & Konermann, 2004; El‐Hawiet et al, 2012a), and cooperativity (Dyachenko et al, 2013; Lin et al, 2014). Over the past decade, such approaches have gained importance as small molecule screening studies in the pharmaceutical industry (Hofstadler & Sannes‐Lowery, 2006; Vivat Hannah et al, 2010; Maple et al, 2014).
In an elegant example of MS‐based ligand binding studies, the interaction between gangliosides (sialic acid glycosphingolipid conjugates) and human norovirus proteins has been investigated using three sophisticated native MS strategies (Han et al, 2014a). Initially, the authors used native “catch‐and‐release” ESI‐MS (El‐Hawiet et al, 2012b) to screen a carbohydrate mixture, resembling the oligosaccharide moiety of 17 gangliosides, against the 865 kDa oligomeric norovirus P‐particle, a mimic of the capsid's protruding spike structure. The resulting native mass spectra contained ions representing the most prominently formed P‐particle–carbohydrate complexes. This convoluted signal was mass‐selected and subjected to CID to release the bound carbohydrates, allowing the detection of any dissociated carbohydrate in the low m/z region. Second, a direct ESI‐MS assay (Wang et al, 2003) was applied to confirm the identified ligands and to quantify their binding affinity toward a smaller dimeric version of the P‐particle. Third, to verify the relevance of the P‐particle model system, the derived kinetic constants were cross‐validated by a “proxy protein” ESI‐MS method (El‐Hawiet et al, 2012a). In this approach, a 10.5 MDa norovirus‐like particle and a small (16 kDa) “proxy protein” with known carbohydrate binding affinity were co‐incubated to compete for carbohydrate binding. Since different carbohydrate binding states are much easier to distinguish for small proteins, the carbohydrate occupation of the “proxy protein” was measured at different norovirus‐like particle concentrations. This provided an indirect readout to determine the norovirus‐like particle–carbohydrate binding affinities, which were in good agreement with the results of the direct ESI‐MS approach.
As mentioned above, native (IMS‐)MS has also become a valuable method to study membrane proteins. The emerging view is that some of these proteins may preferentially interact with specific lipids and, especially in the case of membrane transporters, adopt situation‐specific conformations. Intriguingly, both of these aspects can be probed by native IMS‐MS, as illustrated in the following examples. First, a recent IMS‐MS study elucidated the lipid‐binding selectivity of three membrane proteins, the mechanosensitive channel of large conductance, aquaporin Z, and the ammonia channel (Laganowsky et al, 2014). The extent to which these membrane proteins are stabilized by different lipids was measured using IMS‐MS, which gives information on both the protein shape and the protein mass. The former provides direct evidence for partial unfolding in the gas phase as a readout for protein stability, whereas the latter readily identifies the specific protein–lipid complex corresponding to the respective unfolding state. Beneficially, this allows separate interrogation of successive lipid‐binding events, demonstrating how different synthetic and natural lipids or multiple lipid‐binding events modulate the membrane protein stability. In all three cases, the highest stability was rendered by a class of lipids that was shown to be functionally significant for the respective protein. In the second example, factors involved in the conformational transitions of membrane transporters have been investigated for the membrane‐embedded mammalian drug efflux pump P‐glycoprotein, probing the influence of the specific binding of lipids, nucleotides, and drugs (Marcoux et al, 2013). All three classes of small molecules were shown to bind independently as well as concomitantly to the P‐glycoprotein; however, only synergistic binding triggered a significant shift in the conformational equilibrium, resembling the structural transition expected for an efflux process.
Monitoring cellular machineries—the role of MS in integrated structural and molecular biology studies
So far, we have described several examples of biomolecular systems that were successfully probed by peptide‐ and/or protein‐centric MS strategies (see also Fig 1). A few biomolecular assemblies have become recurring subjects of integrated structural and molecular biology studies, often involving biomolecular MS approaches next to more traditional methods such as X‐ray crystallography and EM. In the final section of this review, we will focus on two such systems: the eukaryotic transcription machinery and the bacterial CRISPR–Cas immune system. Our understanding of these systems has substantially increased owing to studies that combined biomolecular MS with other structural and molecular biology techniques. Thus, this section aims to illustrate the added value of such integrated approaches highlighting specifically the niche of biomolecular MS therein.
Eukaryotic transcription complexes
DNA‐dependent RNA polymerases (Pol) are responsible for gene transcription in eukaryotic cells. In most eukaryotes, three of these multi‐subunit enzymes are present, with Pol I synthesizing ribosomal RNA, Pol II producing messenger RNA, and Pol III making transfer RNA and small RNAs. Pol II, the most widely studied subtype, is a 514 kDa protein complex consisting of 12 different subunits (Bushnell & Kornberg, 2003). Several assembly states of Pol II have been characterized by X‐ray crystallography, culminating in structural models of Pol II in complex with transcription factor (TF) IIB, TATA box binding protein, a DNA template, and an RNA synthesis product (Kostrewa et al, 2009; Liu et al, 2010; Sainsbury et al, 2012). However, Pol II engages in even more complex assemblies in the course of mRNA synthesis, recruiting other transcription factors, mRNA processing enzymes, and even supramolecular co‐activators like the Mediator complex. Further complexity arises from the bound RNA transcript but also from heterogeneous PTMs, most notably on the Pol II C‐terminal domain (Heidemann et al, 2013; Allen & Taatjes, 2015; Sainsbury et al, 2015). Similarly, Pol I and Pol III bind several transcription factors and general regulators (Vannini, 2013). Such massive, dynamically interacting ensembles are typically elusive to X‐ray crystallography. Therefore, a number of these assemblies have recently been investigated by hybrid structural and molecular biology approaches, the key elements of which are cryo‐EM and biomolecular MS. These strategies not only uncovered a wealth of structural information but also gave insights into the relationship between the architecture and biochemical function of RNA polymerase supercomplexes.
Pol II from yeast represents the first fully assembled cellular machinery probed by cross‐linking‐MS (Chen et al, 2010b). The architecture of Pol II was accurately reflected by cross‐linking‐MS, initiating its rise to an established structural biology method. Moreover, the Pol II–TFIIF binding interface could be mapped based on specific inter‐protein cross‐links. The Pol II–TFIIF complex model was further extended in a more comprehensive cross‐linking‐MS study, resulting in a structural model of the yeast core initiation complex that could be reconciled with previously obtained biochemical insights into the yeast pre‐initiation complex (Mühlbacher et al, 2014). To obtain high‐resolution structural models of this pre‐initiation complex, cross‐linking‐MS was used in combination with cryo‐EM in two independent investigations (Murakami et al, 2013; Plaschka et al, 2015). The more recent study from Plaschka et al (2015) even went beyond the pre‐initiation complex and provided the structure of the pre‐initiation complex bound to the co‐activating Mediator core complex. The structure of this 1.2 MDa supercomplex was solved with subnanometer resolution using cryo‐EM; however, this analysis did not reveal the subunit arrangement within the Mediator middle module. Instead, this part of the Mediator core complex could be topologically elucidated based on the cross‐linking‐MS distance constraints, demonstrating the benefits of integrating complementary structural biology approaches (Plaschka et al, 2015).
The Mediator complex is exemplary for biomolecular assemblies that are refractory to traditional structural biology methods, as it is conformationally highly flexible and compositionally diverse (Fig 5). The Mediator middle module was investigated early on by native MS, tandem‐MS, and IMS‐MS, which, in combination with light scattering, small‐angle X‐ray scattering (SAXS), and pull‐down assays, revealed its overall shape and subunit topology (Koschubs et al, 2010). This model was later refined using cross‐linking‐MS and homology modeling (Larivière et al, 2013). Furthermore, a structure of the Mediator head module has been determined by X‐ray crystallography and cross‐linking‐MS (Robinson et al, 2012). A full Mediator complex model was finally derived based on a comprehensive hybrid structural biology strategy combining X‐ray crystallography, cryo‐EM, computational modeling, and cross‐linking‐MS (Robinson et al, 2015). Very recently, this model was extended to a full Mediator‐Pol II pre‐initiation complex structure comprising 52 protein subunits (Robinson et al, 2016).
Moving on from transcription initiation to mRNA elongation and processing, native MS, cross‐linking‐MS, and cryo‐EM proved once more to be a fruitful combination, uncovering the structural and biochemical basis for co‐transcriptional mRNA capping (Martinez‐Rucobo et al, 2015). The capping process, which modifies the 5′ end of the newly synthesized mRNA, is performed in yeast by the Cet1 triphosphatase and the Ceg1 guanylyltransferase. Through native MS, a heterotetramer of these enzymes was shown to bind to Pol II. Next, the relevance of the formed capping and transcribing Pol II complex was proven by MS‐based monitoring of the stepwise mRNA modification. This paved the road for cryo‐EM and cross‐linking‐MS experiments that revealed the capping enzyme binding site at the Pol II mRNA exit tunnel.
In contrast to Pol II, eukaryotic Pol I and Pol III were only recently characterized by high‐resolution structures (Engel et al, 2013; Fernández‐Tornero et al, 2013; Hoffmann et al, 2015). However, some structural aspects of both complexes have previously been revealed by MS analysis. Native (tandem‐)MS together with in‐solution dissociation experiments confirmed that 10 of the 17 Pol III subunits form a stable core, whereas the remaining 7 subunits are more peripheral (Lorenzen et al, 2007). Three of these peripheral subunits form the C82/34/31 subcomplex, which could be accurately positioned on the Pol III core based on cross‐linking‐MS and biochemical data (Wu et al, 2012). The presence of Pol III subcomplexes was confirmed in another native MS study that also included IMS‐MS to obtain topological information (Lane et al, 2011). This study additionally probed the assembly of Pol I, finding that Pol I and Pol III exhibit similarities in their disassembly pathways. While these examples show how MS by itself can render structural insights, both native MS and cross‐linking‐MS were also integrated in hybrid structural biology approaches that revealed structural and functional features of Pol I and Pol III subunits and accessory factors. For instance, key aspects of the Pol III pre‐initiation complex architecture were unveiled by cross‐linking‐MS and X‐ray crystallography of the TFIIIC complex (Male et al, 2015). Regarding the structural organization of Pol I, native MS proved that the Pol I subunits A49 and A34.5—the only ones for which no Pol II homologs exist—form a stable heterodimer that associates with the other 12 Pol I subunits (Geiger et al, 2010). Subsequent crystallization of the A49/A34.5 heterodimer enabled structural comparisons to Pol II‐associated transcription factors, uncovering key similarities between A49/A34.5 and TFIIF as well as TFIIE. Moreover, a cross‐linking‐MS‐based Pol I model showed A49/A34.5 in positions similar to the TFIIF and TFIIE binding regions on Pol II (Jennebach et al, 2012), reinforcing the concept that some Pol I subunits act as stably associated transcription factors. Other factors for transcription initiation, however, are reversibly associated with Pol I. One of them, Rrn3, was studied by SAXS and native MS, which led to the conclusion that it forms dimers in solution but monomerizes during Pol I binding (Blattner et al, 2011). The Rrn3/Pol I interaction site could be mapped by cross‐linking‐MS, resulting in a model of the Pol I–Rrn3 initiation complex (Blattner et al, 2011).
Protein assemblies involved in the CRISPR–Cas immune system
Bacteria and archaea have evolved a variety of defense strategies to withstand viral infection, but most attention was in recent years drawn to the RNA‐guided adaptive immune response through clustered interspaced short palindromic repeats (CRISPR) and CRISPR‐associated (Cas) proteins. The molecular mechanism behind the various CRISPR–Cas systems has been reviewed extensively (van der Oost et al, 2014; Wright et al, 2016). Briefly, CRISPR gene loci can accommodate short stretches of bacteriophage nucleic acid sequences (“spacer”) that are incorporated during viral infection. These spacers are transcribed into CRISPR RNA, which directs the Cas proteins toward matching viral nucleic acids, enabling their degradation.
CRISPR–Cas systems are divided into three main types. Type II systems, which contain CRISPR RNA and only one multifunctional Cas protein, have become highly popular tools for genome engineering (Mali et al, 2013; Wright et al, 2016). Type I and type III systems, however, comprise CRISPR RNA and multiple Cas proteins, forming ribonucleoprotein complexes of 350–450 kDa that represent a challenging target for structural characterization.
Escherichia coli type I Cascade was the first CRISPR–Cas system to be structurally characterized (Jore et al, 2011). Cascade is composed of five different Cas proteins and CRISPR RNA, the masses of which sum up to 184 kDa. When intact Cascade was analyzed by native MS, however, it exhibited a mass of approximately 405 kDa, showing that one or more components must be present in multiple copies. To decipher the Cascade binding stoichiometry, the authors first added a complementary single‐stranded DNA probe and monitored the mass increase by native MS, demonstrating that only one CRISPR RNA is bound. Cascade was next subjected to gas‐phase dissociation and tandem‐MS analysis, monitoring the sequential loss of several Cas subunits. Since these results were not sufficient to infer the complete binding stoichiometry, additional in‐solution dissociation experiments were performed, disrupting Cascade by adding low amounts of organic solvents. The resulting subcomplexes were analyzed by native (tandem‐)MS analysis, which finally allowed to derive copy numbers for all Cas proteins in the intact Cascade assembly. By combining these results with two‐dimensional EM data, a first topological model of Cascade could be proposed (Jore et al, 2011), which soon was confirmed by a more detailed cryo‐EM reconstruction (Wiedenheft et al, 2011a). A similar approach combining native MS, EM, and SAXS was used to probe the P. aeruginosa Csy complex, unveiling remarkable similarities to the Cascade quaternary structure despite the lack of obvious protein sequence homology (Wiedenheft et al, 2011b). This structural analogy was further demonstrated in a follow‐up study that applied native IMS‐MS and molecular modeling (van Duijn et al, 2012).
More recently, several type III CRISPR–Cas systems were subjected to hybrid structural investigations. First structural information was obtained on the T. thermophilus Cmr complex (Staals et al, 2013). This assembly was studied by deep sequencing to probe its CRISPR RNA content, native MS in combination with gas‐phase and in‐solution dissociation to determine its binding stoichiometry, and negative‐stain EM to reconstruct a first 3D complex map, in which the subunits could be placed. Similar approaches combining deep sequencing, native MS, and EM resulted in structural models of Csm complexes isolated from bacteria (Staals et al, 2014) and archaea (Rouillon et al, 2013). Interestingly, all three studies showed that the respective ribonucleoprotein complex architectures resemble the E. coli Cascade complex. The same observation was made for the P. furiosus Cmr complex, which was modeled by using cross‐linking‐MS distance constraints to fit individual Cas protein crystal structures into a low‐resolution EM map (Benda et al, 2014). Taken together, these results strongly suggest that type I and type III CRISPR–Cas systems share some common evolutionary ancestry.
An aspect that has long been neglected in structural studies of CRISPR–Cas systems is the precise mapping of the CRISPR RNA–protein binding interfaces. First attempts to pinpoint these interactions were made during characterization of the T. thermophilus Csm complex, in which contacts between all Csm subunits and their cognate RNA were revealed using UV‐induced RNA–protein cross‐linking and MS (Staals et al, 2014). This strategy was recently shown to be generally applicable for RNA interaction mapping of CRISPR‐associated proteins (Hrle et al, 2014; Sharma et al, 2015), promising even more comprehensive structural maps of CRISPR–Cas systems in the future.
The continuous technological progress of MS provides opportunities to probe the structure and function of biomolecular systems with increasing analytical depth. Most importantly, MS experiments typically yield information that is complementary to the aspects monitored by traditional biochemical or structural biology approaches. While individual methods are often insufficient to understand highly complex and dynamically interacting biomolecular machineries, their characterization can be achieved by merging the unique benefits of diverse analytical techniques, as we have exemplified on CRISPR–Cas and transcription‐related complexes. Closing in upon the in vivo architecture of such cellular key players, the focus of integrated structural studies is moving from recombinantly produced complexes to endogenously existing biomolecular assemblies, which also have become amenable to native MS and cross‐linking‐MS characterization. Evidently, the ultimate goal is to elucidate even more intricate systems, for example, cellular signaling pathways, organelles, and, eventually, entire cells. Here, in particular cross‐linking‐MS will likely prove as an ideal complement to emerging in vivo and in situ technologies such as live‐cell imaging, in‐/on‐cell NMR, and cryo‐electron tomography. However, also other MS‐based approaches (e.g., protein surface labeling and limited proteolysis) are extending their scope toward proteome‐wide structural studies, allowing them to pull their weight in integrated analytical strategies. The future of MS‐based approaches in structural and molecular biology is bright!
Conflict of interest
The authors declare that they have no conflict of interest.
We thank all members of our group, in particular those working with the here‐described MS methods, for stimulating discussions and helpful comments. Moreover, we would like to acknowledge our national and international collaborators for continuously supporting our research. The work in the Heck laboratory is funded by the Roadmap Initiative Proteins@Work (project number 184.032.201) financed by The Netherlands Organisation for Scientific Research (NWO), and the MSMed Program (grant agreement number 686547) within the European Union's Horizon 2020 Framework Programme. Additional funding was received through the ManiFold project (grant agreement number 317371), embedded in the European Union 7th Framework Programme, and a Projectruimte grant (12PR3303‐2) from Fundamenteel Onderzoek der Materie (FOM).
FundingNetherlands Organization for Scientific Research (NWO)http://dx.doi.org/10.13039/501100003246 184.032.201
See the Glossary for abbreviations used in this article.
- collision‐induced dissociation
- electron capture dissociation
- electron microscopy
- electrospray ionization
- electron transfer/collision‐induced dissociation
- electron transfer dissociation
- electron transfer/higher‐energy collisional dissociation
- Fourier transform ion cyclotron resonance
- higher‐energy collisional dissociation
- hydrogen/deuterium exchange
- ion mobility spectrometry
- mass‐to‐charge ratio
- matrix‐assisted laser desorption/ionization
- mass spectrometric analysis of non‐fragmented precursor ions
- mass spectrometric analysis of fragment ions generated by gas‐phase dissociation of precursor ions
- mass spectrometry
- post‐translational modification
- small‐angle X‐ray scattering
- mass spectrometric experiment combining MS1 analysis and subsequent MS2 analysis of a selected m/z range
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs 4.0 License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
- © 2016 The Authors. Published under the terms of the CC BY NC ND 4.0 license