In an effort to understand the dynamic organization of the protein interaction network and its role in the regulation of cell behavior, positioning of proteins into specific network localities was studied with respect to their expression dynamics. First, we find that constitutively expressed and dynamically co‐regulated proteins cluster in distinct functionally specialized network neighborhoods to form static and dynamic functional modules, respectively. Then, we show that whereas dynamic modules are mainly responsible for condition‐dependent regulation of cell behavior, static modules provide robustness to the cell against genetic perturbations or protein expression noise, and therefore may act as buffers of evolutionary as well as population variations in cell behavior. Observations in this study refine the previously proposed model of dynamic modularity in the protein interaction network, and propose a link between the evolution of gene expression regulation and biological robustness.
In order to test the specific organizational layout of transcriptionally regulated (dynamic) versus nonregulated (static) proteins in the protein interaction network, we integrated high‐confidence protein interaction data from yeast with high‐throughput microarray gene expression data. The extent of transcriptional regulation of a gene (i.e. expression variance, EV) was simply scored by taking the statistical variance of its expression profile across 272 microarray experiments from various conditions.
By constructing a global interaction preference matrix of proteins with various EVs, we find that the network is enriched for clusters of static and dynamic proteins (static and dynamic neighborhoods, respectively) (Figure 1B). These neighborhoods are specialized functional modules dedicated to specific cellular processes like mRNA synthesis, protein degradation or vesicle trafficking. Interestingly, some cellular functions seem to be mainly performed by static modules, whereas others are mainly carried out by dynamically expressed modules, pointing to functional distinction between the two types of modules.
Our study shows that expression criteria for a protein to be located within a module are that either it has to be highly coexpressed with its neighbors in the network or it must be located within a static neighborhood. An earlier study named hubs (highly connected proteins) that are highly coexpressed with their neighbors and that are therefore in modules as ‘party’ hubs, and those that are not coexpressed with their neighbors and located outside modules as ‘date’ hubs (Han et al, 2004). Here, we named hubs located in static neighborhoods as ‘family’ hubs (they interact with their neighbors constitutively), as these hubs do not belong to party or date hubs because they are located within modules but are not highly coexpressed with their neighbors owing to their static expression pattern. Therefore, family and party hubs constitute the static and dynamic modules in the cell, respectively, whereas date hubs organize them into a network.
Based on the classification of hubs by Han et al (2004), date hubs have been found to evolve at a faster rate than party hubs, thereby suggesting that modularity imposes a constraint on the evolvability of proteins and that the protein interaction network mainly evolves by ‘re‐wiring’ the connections between modules in the network (Fraser, 2005). We also find that party hubs evolve much slower than other hubs (Figure 5A). However, family hubs are the ones that evolve the fastest rather than date hubs (Figure 5A), suggesting that modularity per se does not impose a constraint on the evolvability of proteins, as family hubs are also modular. Consistent with their evolutionary plasticity, deletions of family hubs in yeast are tolerated significantly more than deletions of party hubs, indicating specific robustness of the cell to dysfunctions in static modules. Moreover, family hubs are significantly ‘noisier’ in their expression (Figure 5C), meaning that their expression levels vary from cell‐to‐cell considerably more when compared to party or date hubs. These observations argue that family hubs are the most variable components of the cell both genetically during evolution and expressionally between cells in a population. Family hubs, and hence static modules, could therefore serve as buffers of genetic variations as well as of expression noise within the cell that contribute to the robustness of the cell.
Highly regulated (dynamic) and non‐regulated (static) proteins have a specific interaction pattern in the eukaryotic protein interaction network such that they cluster within distinct network neighborhoods to form highly specialized, respectively, dynamic and static functional modules.
In addition to the “party” and “date” hubs described earlier (Han et al, 2004), we identify “family” hubs that function in static functional modules and are therefore constitutively present in the network and interact with their partners constitutively.
Proteins in static modules, but not those in dynamic modules, seem to be prone to evolutionary genetic modifications as well as to protein expression noise, suggesting that these modules function as buffers of variations in the network that confer robustness to the cell.
Revealing complex patterns and organizational principles of biological systems is an important goal of systems biology. Recent studies have considerably expanded our understanding of organizational principles underlying biological networks. Much of the past effort in this field has focused on the topological properties of protein interaction and gene regulation networks and many of their design principles have been uncovered, such as the scale‐free topology (Albert et al, 2000; Jeong et al, 2000), modularity (Ihmels et al, 2002; Ravasz et al, 2002; Spirin and Mirny, 2003; Han et al, 2004; Gavin et al, 2006), disassortativeness (Maslov and Sneppen, 2002) and enrichment for certain network motifs (Milo et al, 2002; Harbison et al, 2004; Luscombe et al, 2004). At the level of cell behavior, these properties are thought to promote robustness (Albert et al, 2000; Jeong et al, 2001; Maslov and Sneppen, 2002) and reliability of information processing (Klemm and Bornholdt, 2005). The studies described above mainly analyzed static protein interaction networks without accounting for dynamic properties that arise as a result of gene expression programs that modulate the expression of proteins in the network. Integration of protein interaction data with the gene expression data in recent years, however, has given some important insights into the dynamic organization of the eukaryotic protein interaction network (Ge et al, 2001; Han et al, 2004; Ihmels et al, 2004; Kharchenko et al, 2005; de Lichtenberg et al, 2005). For example, it has been found that co‐regulated proteins frequently interact with each other (Ge et al, 2001), and metabolic enzymes topologically close to each other in the metabolic network (Kharchenko et al, 2005) or those that participate in the same metabolic pathway (Ihmels et al, 2004) are also frequently co‐regulated. Another study has reported differential positioning of proteins in the protein interaction network based on their coexpression properties with their interacting neighbors in the network (Han et al, 2004). These studies have suggested that a strong correlation exists between topological positioning of proteins in the network and their expression properties.
Recent large‐scale experimental and computational studies have delineated the action of gene expression programs under various conditions (Gasch et al, 2000; Segal et al, 2003a; Harbison et al, 2004; Luscombe et al, 2004). Importantly, these studies defined both condition‐dependent and condition‐independent (constitutive) expression patterns (Luscombe et al, 2004; de Lichtenberg et al, 2005). Given that proteins are subject to variable modes of regulation, we considered the topological positioning of proteins with different levels of transcriptional regulation in protein interaction networks. Previous studies have examined the correlation of co‐regulation of proteins with their topological positioning in the network. Instead, we examined how the protein products of regulated versus nonregulated genes (dynamic and static proteins, respectively) are positioned in the protein interaction network relative to each other. We identified organizational principles in the protein interaction network that appear to dictate the specific relative topological positioning of dynamic and static proteins. This information expands our understanding of protein network dynamics, gives systems‐level insights into how gene expression programs may modulate the protein network architecture and cell behavior, and suggests a link between the evolution of cellular robustness and the evolution of gene expression regulation.
Expression variance of genes across multiple conditions
To identify genes with condition‐dependent or constitutive expression, we leveraged legacy genomic expression profiles derived from cells exposed to multiple conditions. A compendium of 272 microarray experiments from six different data sets from the Saccharomyces Genome Database (SGD, ftp://ftp.yeastgenome.org/yeast/) was compiled and used to calculate the statistical variance of expression profiles for each gene across these 272 experiments. The statistical variance of the expression profile of a gene was assumed to reflect the frequency and magnitude of modulation of its gene product under diverse conditions, such that a low variance would indicate that the protein is relatively static and a high variance would indicate that the protein is relatively dynamic.
For each gene in the genome, an expression variance (EV) was assigned, as defined by the quantile value of the variance of its expression profile in the genomic distribution of variances, such that the EV closest to 0 indicates the gene with the lowest variance in the genome (least dynamic), and the EV value of 1 indicates the gene with the most dynamic expression pattern in the genome. The expression dynamics of the bottom 10 and top 10 genes in the genome‐wide EV distribution are shown in Supplementary Figure 1.
We considered that this metric may capture random noise in the expression of genes with low mRNA levels as high EV, and/or may capture the low overall variations in the levels of highly abundant mRNAs as low EV. However, we found that there is a significantly high positive correlation between the calculated EV values of genes and their mRNA abundance values in the cell (Spearman's ρ=0.21, P<1 × 10−16), indicating that low‐EV genes are expressed at lower levels than the high‐EV genes. This positive correlation indicates that the high‐EV gene set is not enriched for low‐abundance genes and reflects the extent of gene regulation rather than measuring artifacts owing to low mRNA abundance.
A high‐confidence yeast protein interaction network of 2315 proteins connected with 5356 interactions was derived using the confidence scores assigned to each interaction by the study of Bader et al (2004). Importantly, all findings presented below were reproducible using a scoring scheme different from the same study to obtain a slightly different network, or by using an independent high‐quality network from a recent large‐scale study (Krogan et al, 2006) (data not shown).
Interaction preferences of static and dynamic proteins
In order to understand how proteins with different expression dynamics assemble within the network, proteins were analyzed in the context of their network neighborhoods; here, defined as the set of binding partners of a protein. A comparison of EVs of proteins with the EVs of their neighbors was used to examine the relative distribution of static and dynamic proteins within the network. For this purpose, we defined neighborhood EV of a protein to be the average of EV values of its interacting partners (neighbors) in the network. There is a strikingly high positive correlation between EVs of proteins and their neighborhood EVs (Spearman's ρ=0.32, n=2245; P=1 × 10−54) (Figure 1A), which suggests that proteins have similar expression dynamics as their immediate neighbors in the protein interaction network. To examine this correlation at a higher resolution, proteins were grouped into 50 bins according to similarity of EV scores and a 50 × 50 interaction preference matrix was constructed. The density of interactions between every bin pair was displayed as the total number of ‘interbin’ protein interactions. Consistent with the statistical correlation above (Figure 1A), there is a high density of interactions among the low‐EV bins or among the high‐EV bins (Figure 1B). Moreover, the interaction densities between low‐EV and high‐EV bins appear to be extremely low. This interaction pattern was not observed in a randomized network (Figure 1B). These results suggest that the protein interaction network is enriched for sub‐networks that are primarily composed of either static proteins or dynamic proteins, but not both.
The interaction preference matrix shows the number of interactions between bins, which may bias the matrix for the interaction profiles of highly connected proteins. In order to check this, we tested the interaction profiles of proteins with different numbers of interactions (node degrees) and different EVs. We found that the strong correlation in Figure 1B is most apparent for proteins with high node degrees (hubs) (Figure 1C), although the statistical correlation between EV and neighborhood EV in less‐connected proteins is also significantly high (data not shown). These observations suggest that proteins having more interaction partners in the network are segregated into distinct network neighborhoods that are characterized by low and high EVs, respectively.
Functional specialization of static and dynamic neighborhoods
In order to understand what these distinct protein neighborhoods represent, the sub‐network formed by hubs, defined as proteins with greater than six interactions, was visualized. A plot of interactions between hubs reveals that the static and dynamic neighborhoods represented distinct densely connected large clusters of static and dynamic proteins (Figure 2A). A densely connected cluster of proteins in the network is likely to represent a functional module (Pereira‐Leal et al, 2004), that is, a set of interacting proteins dedicated to a specific cellular process such as the mRNA splicing machinery or the proteasome. Dynamically expressed proteins within a functional module are expected to be coexpressed with each other (Ge et al, 2001; Segal et al, 2003b). In order to check this, we defined neighborhood Pearson correlation coefficient (PCC) as a measure of how well proteins in a neighborhood are coexpressed with each other (see Materials and methods). In agreement with previous studies (Ge et al, 2001; Segal et al, 2003b), we found that neighborhoods of dynamic proteins in clusters are highly coexpressed with each other (Figure 2B). Consistent with their static nonvariant expression pattern, static proteins within clusters had lower neighborhood PCC (nPCC) (Figure 2B). These observations suggest the existence of two distinct types of large modules in the cell: those composed of static proteins (static modules), which are presumably always present in the cell as the expression of their members does not seem to be regulated, and those composed of co‐regulated dynamic proteins (dynamic modules), which are expressed in a condition‐dependent manner.
The current notion of functional modules predicts that a set of interacting proteins that are highly coexpressed is likely to be specialized to a specific process (Ihmels et al, 2002; Segal et al, 2003b; Kharchenko et al, 2005), and some studies suggest that the protein interaction network is enriched for interactions between co‐regulated proteins (Ge et al, 2001; Ihmels et al, 2004). We show that sets of interacting static proteins, which are supposedly constitutively present in the cell but do not have high statistical correlation in their expressions, also may represent specialized functional modules, and that they are at least as abundant in the cell as the sets of interacting dynamic proteins that are highly coexpressed. In order to test this hypothesis, a simple function that compares a protein's Gene Ontology (GO) (Ashburner et al, 2000) annotations with that of its neighbors was derived to quantitate the functional specialization of a protein's neighborhood. This function, ‘neighborhood function homology’ (see Materials and methods), generates values in the range from 0 (no shared GO terms between a protein and its neighbors) to 1 (all of GO terms assigned to a protein are shared with its neighbors).
Neighborhood function homology of static proteins (EV<0.25, i.e. lower quartile of genomic distribution) in the network negatively correlates with their neighborhood EV with a high significance (Spearman's ρ=−0.41, P=1.8 × 10−18), suggesting that static proteins interacting with other static proteins are found in functionally specialized neighborhoods. On the other hand, neighborhood function homology of dynamic proteins (EV>0.75, higher quartile of EV distribution) positively correlates with their neighborhood EV (Spearman's ρ=0.40, P=1.5 × 10−12). This indicates that dynamic proteins, in contrast to static proteins, are more functionally homologous to their neighbors when they are in dynamic neighborhoods. Neighborhood function homology of dynamic proteins correlates even more significantly with their average interactor PCC (avPCC) (Spearman's ρ=0.57, P=8 × 10−27), a measure of how well a protein is coexpressed with its neighbors (Han et al, 2004; see Materials and methods). Together, these observations suggest that network neighborhoods composed of constitutively expressed proteins (static neighborhoods) are highly specialized modules, much like the neighborhoods of highly coexpressed proteins (dynamic neighborhoods).
Identification of static and dynamic modules and their functions
Past studies have measured statistical correlation of gene expression in order to assign proteins to specific modules and also to assign new functions to previously uncharacterized proteins (Ihmels et al, 2002; Segal et al, 2003b, 2003c). As static neighborhoods also seem to be functionally coherent, it should be possible to assign proteins to specific modules by the virtue of their associations with static neighborhoods. To this end, all static neighborhoods in our network were identified by compiling all the interactions between static proteins in the network (static network, 491 proteins connected by 897 interactions). The static network consists of 82 distinct disconnected sub‐networks ranging in size from 2 to 86 proteins (Supplementary Table 1). The functional annotations associated with these static sub‐networks appear to be functionally coherent, representing various functions including mRNA transcription and splicing, vesicle transport and cell‐cycle regulation (see Supplementary Table 1). This apparent functional coherence suggests that the static network is enriched for functional modules. In order to test the significance of modular composition of the static network and to see if it is possible to achieve a similar level of functional coherence in a network generated by random draws of interactions, a network modularity metric was defined to measure functional specialization of the interactions in a network (see Materials and methods). The static network shows significantly higher network modularity than what would be expected by random draws of interactions from the large network (Figure 3), suggesting that the association of functionally coherent sets of proteins with each other within static neighborhoods reflects a biological phenomenon. We compared the static network modularity with that of the network formed by highly coexpressed proteins, which is expected to be enriched for functional modules, in agreement with the previous studies showing modularity of coexpressed proteins (Segal et al, 2003b; Han et al, 2004). We identified the dynamic network by taking all interacting pairs of proteins that also have pairwise PCCs of at least 0.65 (383 proteins connected by 777 interactions). This dynamic network, therefore, contains interactions between those proteins that are also highly transcriptionally co‐regulated. The dynamic network consists of 77 sub‐networks mainly composed of dynamic proteins (data not shown) and, as expected from previous publications, the dynamic sub‐networks are highly functionally coherent (see Supplementary Table 2). The dynamic network also shows a significantly high network modularity that is comparable to that of the static network (Figure 3). This indicates that both networks are enriched for functional modules. The fact that only 15 proteins and six interactions are common to both networks indicates that the modules in the two networks are distinct, and that the high modularity of the static network is not a consequence of a significant overlap with the dynamic network. These observations argue that the static protein neighborhoods represent functional modules, and it should be possible to assign proteins to functional modules by virtue of their association with static proteins.
In order to see if either network is specifically enriched for certain cellular functions, we performed enrichment analyses of the two networks for overrepresentations of MIPS functional categories (Mewes et al, 2004). Interestingly, the most significant relative enrichment is seen in the functional categories related to mRNA transcription and processing (static network) and rRNA transcription and processing as well as translation (dynamic network) (see Supplementary Table 3). The static network is enriched for general mRNA transcription (RNA polymerase II holoenzyme complexes), splicing (the pre‐mRNA splicing complex) and processing (CPF and CCR4‐NOT) as well as co‐regulator complexes like the chromatin remodeling complexes (SWI/SNF and INO80), histone acetyl‐transferase complexes (SAGA and NuA4), histone methylase (COMPASS) as well as mRNA nuclear export (TREX) (see Supplementary Table 1). The dynamic network, in addition to RNA polymerase I and III components, contains modules like the SSU processome, involved in rRNA processing, and translation initiation factor complexes (see Supplementary Table 2), which is consistent with studies reporting extensive regulation of these modules under various stress conditions (Warner, 1999; Gasch et al, 2000). In addition, the dynamic network contains most of the proteasomal proteins, whereas the static network also contains many of the mitochondrial ribosomal proteins.
There are many modules in the two networks that also seem to perform similar functions. For example, components of the mitotic cohesin complex, which holds sister chromatids together, and the septin ring complex, which is required for cytokinesis, are in the dynamic network (sub‐networks 63 and 55; Supplementary Table 2), whereas the DASH complex, which plays a role in chromosome segregation, and the COMA complex, which is involved in the kinetochore assembly, are in the static network (sub‐networks 51 and 58; Supplementary Table 1). Components of the anaphase‐promoting complex (APC) are also static (sub‐network 4), as reported previously (de Lichtenberg et al, 2005). These complexes are all involved in the final stages of cell division, yet their regulation is markedly different. Another potentially interesting correlation relates to vesicle trafficking, where proteins associated with clathrin‐coated vesicles (AP‐1 and AP‐3 complex proteins) seem to be static (sub‐networks 9 and 69), whereas those associated with coatomer protein‐coated vesicles that are involved in vesicle transport between Golgi and ER (COPI and COPII complex proteins) are dynamic (sub‐networks 10, 45 and 76). The dynamic expression pattern of the latter may stem from the involvement of the early secretory pathway in various stress responses like unfolded protein response or osmotic stress (Lee and Linstedt, 1999; Higashio and Kohno, 2002; Sato et al, 2002), whereas clathrin‐coated vesicles may play role in constitutive transport. These examples suggest that although some functions in the cell can be classified as static or dynamic (like mRNA and rRNA synthesis, respectively), many others are carried out through dynamic interplay between distinct static and dynamic modules. A closer analysis of expression dynamics of functional modules under various conditions may provide an in‐depth insight into the regulation of cellular behavior by transcriptional programs.
Expression properties of centrally positioned proteins
A recent study classified hub proteins into two based on their coexpression with their neighbors. They reported that hubs that do not statistically correlate with their neighbors in expression are positioned centrally in the network, meaning that they function between modules as organizers of cellular processes rather than having a specialized function inside a module (Han et al, 2004). However, as shown above, hubs that are found within static neighborhoods also do not statistically correlate with their neighbors in expression even though they are within modules, and therefore are not central. The bona fide central hubs, therefore, could be proteins that have low avPCC (i.e. those that do not belong to dynamic modules) and relatively high neighborhood EV (i.e. those that do not belong to static modules). In order to test this hypothesis, we tested the ‘betweenness’ centralities of hub proteins with different avPCC as well as neighborhood EV values. Betweenness centrality is a graph theoretic measure of network centrality that measures how frequently a node is found ‘on the path’ between other nodes in the network, and therefore scores how ‘important’ a node is for communication between other nodes in the network (Wasserman and Faust, 1994) (see Materials and methods). As expected, proteins with low avPCC and relatively high neighborhood EV have high betweenness centralities and low neighborhood densities, strongly suggesting that these hubs are positioned centrally in the network (Figure 4A). Their low neighborhood function homologies, in turn, suggest that they are involved in multi‐functional interactions, indicating that they may have roles as integrators of multiple processes in the cell (Figure 4A). Unlike hubs in modules, central hubs and their neighbors have a broad distribution of EV values (Figure 4B), suggesting that true central hubs interact with proteins of diverse expression patterns.
In their study, Han et al (2004) defined hubs that are highly coexpressed with their neighbors as ‘party’ hubs, which are modular, and those that are not coexpressed with their neighbors as ‘date’ hubs, which they reported as central. However, the set of date hubs also contains hubs that are found within static modules (where there is also no statistical correlation of expression among neighbors). Therefore, based on our findings, we propose that static hubs interacting with static proteins within static modules be excluded from date hubs and, in analogy to the party–date hub terminology, be named ‘family’ hubs, as they are always present in the network and interact with their neighbors constitutively. Therefore, family and party hubs form static and dynamic modules, respectively, whereas date hubs (family hubs excluded) organize the network. In concordance with their central functions, our date hubs are enriched for signal transducing and signal regulating proteins like protein kinases, phosphatases, small G‐proteins and molecular chaperones (Table I). Date hubs contain 20 out of 22 hub protein kinases (the two hub kinases excluded from date hubs being YAK1, a kinase in the glucose sensing pathway, and SSN3, a C‐terminal kinase in the RNA polymerase holoenzyme complex), five out of six hub phosphatases and all of hub small GTPases in the network, indicating that these proteins constitute the true central coordinators of the cellular network.
Protein‐expression noise and evolutionary rate in the static and dynamic networks
Based on the classification of hubs by Han et al (2004), it was suggested that centrally positioned hubs in the network evolve faster than hubs in modules, and that modularity imposes a constraint on the evolvability of proteins, suggesting an evolutionary scenario where protein networks evolve mainly by modifying their central coordinators (Fraser, 2005). We examined this hypothesis in the context of our modified hub classification, and also found that party hubs evolve at a significantly slower rate than other hubs (Figure 5A). However, surprisingly, family hubs do not evolve slower than date hubs (Figure 5A). By extrapolation, this suggests that hubs present in dynamic modules are evolutionarily constrained, whereas those present in static modules are not. Accordingly, proteins in the dynamic network have significantly lower evolutionary rates than proteins in the static network (P<1 × 10−16, Wilcoxon's test), and there is a significant negative correlation between EV values of proteins and their rates of evolution (Spearman's ρ=−0.21, P=4.5 × 10−15). These results suggest that proteins in static modules have more freedom of variation than proteins within dynamic modules.
Less evolutionary constraint of static modules may indicate that proteins in these modules are largely dispensable for the module function due to compensation in the network. A prediction of this hypothesis is that proteins in static modules are less likely to be essential for cell survival, perhaps owing to their functional redundancy. Indeed, proteins in dynamic modules are almost twice as likely to be essential as proteins in static modules (Figure 5B), which is also true for party hubs when compared to family hubs (data not shown), indicating that the cell is highly tolerant of the loss of proteins in static modules, a property that may allow them to evolve at a faster rate than proteins in dynamic modules.
The significant correlation of protein EVs with their evolutionary rates suggests that the static network may be a buffer of evolutionary variations in the protein interaction network, granting static proteins a role as evolutionary modifiers of cell behavior. We reasoned that if the cell is more tolerant of genetic variations in the components of static modules, then the cell may also be more tolerant of variations in the expression of these proteins within a cell population. Expression variation of proteins between cells within a population, or protein expression noise, is a major factor contributing to the variations of cell behavior among cells within a cell population (Blake et al, 2003; Raser and O'Shea, 2005). Therefore, we compared the expression noise of proteins in static modules with that of proteins in dynamic modules. Using the coefficients of variation of protein expression levels (CV values) within a clonal cell population derived by a recent study (Newman et al, 2006), we found that proteins in dynamic modules are significantly less noisy in their expression when compared to proteins in static modules (P=3 × 10−15, Wilcoxon's test), indicating that the expression levels of static proteins are the ones that show most cell‐to‐cell variations within a population. Accordingly, family hubs have significantly higher CV values than other hubs (Figure 5C). It is surprising to find that proteins with least variable mRNA expression patterns are most variable between cells and during evolution. These observations argue that static components of the eukaryotic protein interaction network are a source of robustness in cell regulatory networks that allows for evolutionary as well as populational variations in cell behavior (see Discussion).
Expression levels of static and dynamic modules
EV of genes positively correlates with their mRNA abundance (Spearman's ρ=0.21, P<1 × 10−16), and accordingly, proteins in static modules are expressed at a significantly lower level than those in dynamic modules (Wilcoxon's test, P<1 × 10−16). This may suggest that the correlation of EV with the organizational layout in the protein interaction network may be a reflection of the effect of expression levels of proteins rather than their EV. The expression levels of proteins does seem to contribute to the protein network layout, as there is a high positive correlation between mRNA abundance values of hub proteins and that of their neighbors in the network (i.e. average neighborhood mRNA abundance, Spearman's ρ=0.43), although the correlation is significantly less than that between EV and neighborhood EV (Spearman's ρ=0.61). This correlation is not surprising given that the expression levels of proteins participating in the same protein complex are generally similar (Papp et al, 2003). The relatively low correlation between EV and mRNA abundance and the fact that the correlation of mRNA abundance between neighboring proteins is less than that of EV suggests that our observations with EV above are not an artifact of the underlying mRNA abundance values. In order to rule out the possibility that our observations with EV values of proteins presented above are an artifact of their expression levels, we performed partial correlation analyses (see Materials and methods) between EV, neighborhood EV, neighborhood function homology and mRNA abundance values of proteins. Partial correlation between EV and neighborhood EV while controlling for mRNA abundance is almost as high (rEV∼neigh.EV, mRNA=0.66) as their normal correlation (rEV∼neigh,EV=0.67). Similarly, partial correlation between neighborhood function homologies of static proteins with their neighborhood EV while controlling for mRNA abundance or average neighborhood mRNA abundance is almost as high as their normal correlations (data not shown). These observations argue that the observed effects of EV on the organizational layout of the protein interaction network are not an artifact of expression levels of proteins, and that proteins segregate into different modules according to their EVs.
A proper stoichiometry in the expression levels of components of a module is essential as an imbalance in the levels of the module constituents can be deleterious (balance hypothesis) (Papp et al, 2003). A priori, there are two simple ways to control stoichiometry of module components at the level of transcription: by maintaining constant expression and by co‐regulated expression of all components. Both mechanisms are apparently employed for the design of cellular modules, leading to an organizational model of the network resembling a circuit board with integrated ‘built‐in’ as well as removable ‘plug‐and‐play’ components. For example, the functionally ubiquitous process of mRNA synthesis and splicing is carried out by proteins organized in modules with apparent invariant expression. The highly dynamic nature of ribosome biogenesis modules, on the other hand, has been suggested to be a mechanism of energy preservation for the cell under stress, as transcription of ribosomal genes accounts for around 80% of all RNA synthesis in the cell (Warner, 1999).
Although the expression variations of modular proteins are constrained by those of their neighbors, central proteins, which are versatile in their functions, are also more versatile in their expression patterns. The existence of both static and dynamic central hubs, which are presumably the coordinators of cellular processes, suggests that some connections between processes in the cell are ‘hard‐wired’, whereas some are adjustable depending on the cellular requirements. For example, the sub‐network 2 in our static network (Supplementary Table 1) indicates that the TFIID/SAGA complex is hard wired to the nuclear proteasomal complex (Supplementary Figure 2), suggesting an integral function of the proteasome in sequence‐specific transcription, consistent with previous reports (Lee et al, 2005; Auld et al, 2006). This sub‐network also indicates an integral connection of vesicle trafficking with general mRNA synthesis, a relationship that to our knowledge has not yet been explored. Therefore, in addition to revealing some novel architectural characteristics of the protein network, the analysis employed in this study also helps reveal how local dynamics of the network architecture may shape cell behavior.
The faster evolutionary rate and higher expression noise in static modules suggests that robustness to variations in these modules may be a selected trait during evolution. As expression noise may contribute to population fitness of unicellular organisms (Kaern et al, 2005; Raser and O'Shea, 2005), localization of noise to static modules may reflect a specific fitness advantage to the population. An interesting observation consistent with this hypothesis is that proteins functioning in the regulation of mRNA synthesis, which are mostly static in yeast, have been found to be phenotypic enhancers of genetic mutations in worm as well as of oncogenic mutations in human cancers (Lehner et al, 2006). This suggests that fluctuations in the levels of these modules may largely enhance cell‐to‐cell variations within a population and consequently increase robustness of the population to environmental fluctuations. Similarly, genetic variations in static modules during evolution may result in the phenotypic enhancement of other mutations in the cell, which may facilitate adaptation. As mRNA abundance is a major factor contributing to protein expression noise (Newman et al, 2006) and evolutionary rate (Pal et al, 2001), it is conceivable that relatively lower expression levels of static modules is an evolutionarily selected trait to maximize variations in these modules.
A future comparison of the expression dynamics of the protein network of yeast with that of higher eukaryotes should give an insight into the evolution of expression dynamics in concordance with the evolution of protein network connectivity and robustness.
Materials and methods
Microarray data sets
The microarray gene expression data sets from various conditions (cell cycle, sporulation, stress response, unfolded protein response and diauxic shift) were obtained from the Saccharomyces Genome Database (ftp://ftp.yeastgenome.org/yeast/). In order for the data to account for true fold differences in the expression of genes relative to the control (i.e. 0′ time point), 0′ time points were removed from the data sets, and the corresponding later time points were zero‐transformed by subtracting the expression values at these time points from those at the 0 time point.
Protein interaction network
The protein interaction network was obtained using the confidence scores assigned to potential interactions in the study of Bader et al (2004). Following the original study (Bader et al, 2004), a high cutoff of 0.65 was used to obtain a high confidence network. The giant connected component of the network was used in the analyses.
nPCC of a protein is defined as the average of all pairwise PCCs between all the proteins in its neighborhood including itself;
where nPCCa is the nPCC of a protein a, n is the node degree of protein a plus 1 (for itself) and PCCij is the PCC between proteins i and j.
Neighborhood function homology
Let Gi be the set of GO terms assigned to protein i that has a node degree of k. Neighborhood function homology Fi of the protein i is defined as
where Gj is the set of GO terms assigned to the jth neighbor of protein i. Fi ranges from 0, where there are no shared GO terms between protein i and its neighbors, to 1, where all GO terms assigned to the protein i are also present in all of its neighbors.
Average interactor PCC
Following Han et al (2004), we defined avPCC as the average of pairwise PCC values between a protein and its neighbors. Differently from nPCC, which is a measure of how the proteins in the neighborhood are coexpressed, avPCC measures how a protein is coexpressed with its neighbors.
Neighborhood density of a protein is its clustering coefficient. Clustering coefficient CCi of a protein i is defined as
where N is the number of interactions between the neighbors of protein i, and k is its node degree.
First, a function similarity matrix was constructed by measuring all pairwise function similarities between proteins in the network. The pairwise function similarity between proteins i and j was defined as
where Gi and Gj are the sets of GO terms assigned to proteins i and j, respectively. For a network of n proteins, the function similarity matrix S would be a matrix of dimensions n × n. The network modularity M is calculated by summing the pairwise function similarities between every interacting pair of proteins in this network and dividing by the total number of interactions in the same network:
where A is the adjacency matrix of the network and has the same dimensions as S. A is boolean, Ai,j being 1 only if proteins i and j interact, and 0 otherwise.
Betweenness centrality of a node i in the network, shown as CB(i), is given by
where gjk(i) is the number of shortest paths between nodes j and k that pass through node i and gjk is the total number of shortest paths connecting nodes j and k (Wasserman and Faust, 1994).
Protein‐expression noise values
For protein‐expression noise values, we used the values derived by a large‐scale single‐cell proteomic analysis of Newman et al (2006). They defined protein‐expression noise as CV (s.d. divided by mean expression) of protein expression between cells in a population.
Partial correlation analysis
Linear correlation between two variables a and b is given by
where cov(a,b) is covariance between a and b, and var(a) is variance of a. Partial correlation between a and b while controlling for a variable c is given by
Equations were taken from (de la Fuente et al, 2004).
We thank Dr Chin‐Rang Yang for helpful comments and discussions on the manuscript. This work was supported by the Robert E Welch Foundation I‐1414.
Supplementary Figure 1 [msb4100149-sup-0001.jpg]
Supplementary Figure 2 [msb4100149-sup-0002.jpg]
Legends to Supplementary Figures [msb4100149-sup-0003.doc]
Supplementary Table 1 [msb4100149-sup-0004.xls]
Supplementary Table 2 [msb4100149-sup-0005.xls]
Supplementary Table 3 [msb4100149-sup-0006.xls]
- Copyright © 2007 EMBO and Nature Publishing Group