A protein interaction network describes a set of physical associations that can occur between proteins. However, within any particular cell or tissue only a subset of proteins is expressed and so only a subset of interactions can occur. Integrating interaction and expression data, we analyze here this interplay between protein expression and physical interactions in humans. Proteins only expressed in restricted cell types, like recently evolved proteins, make few physical interactions. Most tissue‐specific proteins do, however, bind to universally expressed proteins, and so can function by recruiting or modifying core cellular processes. Conversely, most ‘housekeeping’ proteins that are expressed in all cells also make highly tissue‐specific protein interactions. These results suggest a model for the evolution of tissue‐specific biology, and show that most, and possibly all, ‘housekeeping’ proteins actually have important tissue‐specific molecular interactions.
Nearly all processes in biology are dependent on the precise physical interactions among many individual proteins. These range from the maintenance of cellular architecture and the propagation of the genetic material, to the ability of cells to process and respond to environmental information. Defining a near‐complete map of the physical interactions that can occur between human proteins—the human protein ‘interactome’—is an important ambition of current research. Similar to the sequence of the human genome, the human interactome serves as a resource for researchers and can be used to understand how proteins are organized to perform functions within a cell (Bork et al, 2004; Cusick et al, 2005).
Protein interactome mapping projects were pioneered in model organisms (Uetz et al, 2000; Walhout et al, 2000; Ito et al, 2001; Ho et al, 2002; Li et al, 2004; Gavin et al, 2006; Krogan et al, 2006), with initial efforts in humans focused on particular pathways or genomic regions (Bouwmeester et al, 2004; Lehner and Sanderson, 2004; Lehner et al, 2004; Jeronimo et al, 2007). More recently, the cloning of large sets of human open reading frames and improvements in interaction assays have allowed these efforts to be expanded by an order of magnitude to the scale of the human proteome (Rual et al, 2005; Stelzl et al, 2005; Ewing et al, 2007). These data, combined with extensive efforts to collate known interactions from the scientific literature (Bader et al, 2001; Xenarios et al, 2002; Pagel et al, 2005; Persico et al, 2005; Stark et al, 2006; Kerrien et al, 2007; Vastrik et al, 2007; Ruepp et al, 2008), mean that there is now a reasonably extensive resource of known human protein interactions (Hart et al, 2006).
A global interactome network provides an overview of all of the physical interactions that can occur between human proteins. However, very little is known about when and where each of these interactions can occur. Within any particular cell or tissue of the human body not all protein interactions can occur. Most simply, if two genes are not expressed in a cell, then an interaction between their protein products cannot occur.
In unicellular organisms, one approach that has been used to investigate the dynamics of interaction networks between cellular states has been to integrate interactome data with expression data. This approach has been used to identify co‐regulated interaction modules (Ihmels et al, 2002; Komurov and White, 2007) or to investigate the relationships between interaction network topology and gene co‐expression (Han et al, 2004). Additional studies have used gene expression (Luscombe et al, 2004; de Lichtenberg et al, 2005) or functional information (Rachlin et al, 2006) to investigate the cellular conditions (or ‘context’) under which interactions can occur, and to distinguish between condition‐dependent and condition‐independent interactions.
In the present study, we apply a similar approach to the human protein interaction network, using global gene expression data to identify the human cells and tissues in which each interaction can or cannot occur. By performing this analysis, we are able to investigate the relationship between the tissue specificity of a protein and its number of interaction partners. Moreover, and strikingly, we find extensive communication between universally expressed proteins and those with tissue‐specific expression. Even the most tissue‐specific proteins normally interact directly with components of the core cellular machinery. Conversely, nearly all universally expressed ‘housekeeping’ proteins have protein interactions that can only occur in a restricted subset of cells. Our results suggest a model for the evolution of tissue‐specific functions through the modification and re‐use of core cellular processes, and that most ‘housekeeping’ proteins should probably be considered as important for tissue‐specific processes.
Construction of a global human protein interaction network
To construct a global human physical protein interaction network, we integrated data from 21 different sources to define a network of 80 922 physical interactions that can occur between 10 229 human proteins. We only included interactions supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins (see Materials and methods; Supplementary Table 1). Moreover, to account for differences in interaction assay reliability, throughout this work, we also consider a high‐confidence subset of this global network that consists of interactions reported in at least two independent primary research publications. There are a total of 13 102 of these multiple publication‐supported interactions that connect 4750 human proteins.
Determining the tissue specificity of human protein interactions
We then used gene expression data (Su et al, 2004) to determine the cells and tissues of the human body in which each of these interactions can occur (Figure 1A). If two genes are co‐expressed in a cell, then under some condition their products can physically interact in that cell. However, if two proteins are not expressed in a tissue, then the interaction cannot occur in this tissue. The complete set of interactions, their supporting evidence, and the cells and tissues in which each interaction can occur are provided as Supplementary Table 1 as a resource for researchers interested in the biology of any particular human cell or tissue.
Tissue specific and recently evolved proteins make few protein interactions
We first examined the relationship between the tissue specificity of a protein and the number of interactions that it makes (a protein's interaction degree). We find that more tissue‐specific proteins make fewer interactions than widely expressed proteins (Figure 1B, Spearman's rho=0.19, P<2.2e−16). This is true both for the complete and for the multiple‐support interaction dataset (Supplementary Figure 1A), and when excluding all protein complexes (Supplementary Figure 1B). It has been shown earlier that tissue‐specific proteins are more likely to be recent evolutionary innovations than universally expressed proteins (Lehner and Fraser, 2004b). We find that more‐recently evolved proteins have fewer interactions than ancient proteins, but that the relationship between tissue specificity and interaction degree is seen for both sets of proteins (Figure 1C). That is, the older a protein is, and the more tissues in which it is expressed, the more protein interactions it is likely to have.
The most tissue‐specific proteins normally interact with core cellular components
We next analyzed the extent to which tissue‐specific proteins interact with the most widely expressed proteins. We find that even when only considering the most tissue‐restricted proteins (proteins expressed in ⩽10/79 tissues), most of them are known to interact directly with universally expressed human proteins (Figure 2A). The same result is seen when only considering high‐confidence human protein interactions (Supplementary Figure 2A), and when using diverse definitions of universally expressed proteins (Figure 2A). Thus, most tissue‐specific proteins can function by directly contacting components of the core cellular machinery.
Most universally expressed proteins have tissue‐specific protein interactions
Constitutively expressed proteins are often considered as important for ‘housekeeping’ biological processes that are required in all cells. However, nearly all of the most widely expressed proteins have interactions with other proteins that are not themselves universally expressed (Figure 2B). That is, most universally expressed proteins have physical interactions that can only occur in a restricted subset of cells and tissues. The same result is seen when using the complete interaction dataset, when only considering high‐confidence interactions described in multiple independent publications (Supplementary Figure 2B), or when using diverse definitions of universally expressed proteins (Figure 2B). Thus most, and possibly all, universally expressed proteins have tissue‐specific molecular interactions.
Proteins that themselves have restricted expression patterns also have many interactions that can only occur in a subset of the tissues in which they are expressed (Figure 2C). That is, as a consequence of interactions between more and less widely expressed proteins, human protein interactions are often more tissue specific than proteins (P<10−16).
Extensive re‐use of housekeeping proteins for tissue‐specific biological processes
To further illustrate how housekeeping proteins are widely re‐used for tissue‐specific biological processes, we considered neuronal protein complexes that function in synaptic transmission, learning, and memory. The subunits of these complexes have been identified by extensive proteomic approaches, and the importance of individual subunits for learning and memory have been validated by genetic studies in mice and by clinical studies in humans (Pocklington et al, 2006). We estimate that ∼20–60% of the subunits of these neuronal‐specific complexes are actually universally expressed housekeeping proteins (Figure 3A and B). Moreover, in ∼30% of cases, these housekeeping subunits have genetically verified roles in learning and memory (Figure 3C). Thus, universally expressed proteins, through their tissue‐specific interactions, can be re‐used and essential for highly tissue‐specific biological processes.
The evolution of tissue‐specific biological processes
Taken together, our findings suggest the following model for the evolution of tissue‐specific functions. Many (but not all) tissue‐specific proteins are recent evolutionary innovations (Lehner et al, 2004). In general, these tissue‐specific proteins initially make few interactions, and these interactions are frequently with much more widely expressed and ‘housekeeping’ components of the cell. Thus, many tissue‐specific proteins probably function by directly recruiting or modifying the activities of core cellular components.
There are, however, exceptions to this trend, with some tissue‐specific proteins acting as ‘local’ hubs in the interaction network of a particular tissue (our unpublished observation).
Frequent re‐use of housekeeping proteins for tissue‐specific biology
Universally expressed ‘housekeeping’ proteins tend to make many interactions. Many of these interactions (∼50–60%, Supplementary Figure 3) are with other housekeeping proteins. However, the majority of universally expressed proteins also make interactions that can only occur in a subset of the tissues in which they are expressed. Therefore, there appears to be very frequent, and possibly universal, re‐use of ‘housekeeping’ proteins to perform tissue‐specific biological processes. That is, most housekeeping proteins can be considered to be important for different (or at least modified) biological processes in different tissues.
In summary, our results suggest that it might be better to consider the biology of any particular tissue in the terms of the particular interactions that can occur in that tissue, rather than simply in the terms of the unique proteins that are expressed there.
The importance of interaction network dynamics
In unicellular yeast, broadly expressed proteins can have precisely temporally regulated activities because of their interactions with proteins with restricted expression profiles (de Lichtenberg et al, 2005). We show here that a similar process may be widely used in multicellular organisms to restrict and modify the activities of a protein to a subset of the tissues in which it is expressed.
Together with earlier analyses in yeast (Han et al, 2004; Luscombe et al, 2004; de Lichtenberg et al, 2005), this work highlights the importance of considering global interaction networks as having dynamic, not static, structures, and topologies. Additional work analyzing how the networks of molecular interactions change between cell types, states, and conditions should prove a fruitful approach for understanding living systems.
Materials and methods
Protein interaction data
We compiled human protein interactions from a total of 21 different databases, as listed in Table I. We required that each interaction be supported by at least one piece of direct experimental evidence demonstrating physical association between two human proteins, and removed all interactions that did not meet these criteria. All interactions were mapped to common Ensembl gene identifiers. The complete network (‘CRG‐all’), consists of 80 922 interactions between 10 229 human proteins (approximately half the human proteome) and is available as Supplementary Table S1.
Filtered interaction dataset
In total, 13 102 of the interactions in our network between 4750 proteins are supported by experimental evidence of physical binding reported in at least two different primary research publications. Given the multiple lines of evidence supporting these interactions, we use this subset of interactions (‘CRG‐filtered’) as high‐confidence interactions to confirm that our conclusions are not affected by interaction data quality or sampling (see Supplementary Figures).
To identify which protein interactions can occur in a particular cell or tissue type, we used global gene expression data. Although interactions can be regulated by localization, phosphorylation, etc, we aim to distinguish the proteins that can interact under some condition in a tissue from those that cannot, and mRNA expression is a reasonable indicator of this potential. We used expression data from the GNF Atlas project that measured expression across 79 different human cell or tissue types (Su et al, 2004). The MAS5 normalized expression levels were averaged between experimental replicas, and in cases where more than one probe set was present for a gene, the more sensitive probe set was used. In this dataset, a gene is considered as present in a tissue, if its normalized expression level is >200 (Su et al, 2002). However, our conclusions remain the same when this stringency is increased or decreased (see Supplementary information). At this threshold, >98% of the interaction partners in our global network for which expression information is available are co‐expressed in least one human tissue.
We identified universally expressed housekeeping proteins using a total of 10 different criteria. First, we used the GNF Atlas data, and considered housekeeping proteins as those with an expression level above 200 in all 79 tissues, or in more than 70/79 tissues (i.e. allowing for some false‐negatives). Second, we used the same two tissue criteria, but increased (250) or decreased (150) the stringency at which a gene is considered expressed. Third, we used four additional sets defined in an earlier publication—genes identified as expressed in 18/18 or at least 16/18 tissues using microarray data, and genes with the same tissue criteria but defined using expressed sequence tag (EST) data (Zhu et al, 2008).
Neurotransmitter receptor complexes
Components of N‐methyl‐d‐aspartate receptor and metabotropic receptor complexes were identified by extensive proteomic studies as described (Pocklington et al, 2006). We used the 215 subunits of these complexes that could be mapped to human Ensembl gene identifiers, of which 77 have demonstrated roles in learning and memory through genetic studies in mice or are implicated in psychiatric disorders in humans (Pocklington et al, 2006). We used the sets of housekeeping proteins described above to identify how many of these subunits represent universally expressed proteins.
Proteins were classified as metazoan specific or pre‐metazoan using the analysis of Freilich et al (2005).
This work was funded by a European Research Council (ERC) Starting Researcher Grant, the Ministry of Science and Innovation (MICINN‐Plan Nacional), the CRG‐EMBL Systems Biology Program, ICREA and a Leonardo da Vinci fellowship to AB. We thank three anonymous referees for helpful suggestions.
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Materials 1
Supplementary Figures 1 ‐ 3 [msb200917-sup-0001.pdf]
CRG human interactome [msb200917-sup-0002.zip]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2009 EMBO and Nature Publishing Group