Genome‐scale networks can now be reconstructed based on high‐throughput data sets. Mathematical analyses of these networks are used to compute their candidate functional or phenotypic states. Analysis of functional states of networks shows that the activity of biochemical reactions can be highly correlated in physiological states, forming so‐called co‐sets representing functional modules of the network. Thus, detrimental sequence defects in any one of the genes encoding members of a co‐set can result in similar phenotypic consequences. Here we show that causal single nucleotide polymorphisms in genes encoding mitochondrial components can be classified and correlated using co‐sets.
Various high‐throughput (HT) technologies simultaneously measure thousands of interdependent biological variables, and numerous methods are being used to reduce the complexity of HT data sets, to determine dependencies among variables, and to correlate them with biological functions (Lin et al, 2005). Dimensionality reduction is a central process that allows inference of functional principles from highly complex data sets. A conceptually simple way to reduce complexity is to identify patterns of correlation within the data. For example, correlation among mRNA levels in expression profiles can be used to identify sets of co‐regulated genes (Reymond et al, 2002). Similarly, the HapMap project recently illustrated the power of ‘perfect proxy sets’—defined as sets of perfectly correlated single nucleotide polymorphisms (SNPs)—as segments of the human chromosomes and suggested their utility in identifying differences in individual genomes (Altshuler et al, 2005). Systems biology aims to reconstruct networks of cellular interactions mathematically and to compute their functional (i.e. physiological) states (Price et al, 2004). Reducing the complexity of a network by identifying modules of functionally related elements represents an important step towards the understanding of its functional properties. In the context of biochemical networks, the study of their functional states has led to the definition of correlated sets of reactions (co‐sets): groups of reactions that always (Papin et al, 2004) or often (Burgard et al, 2004) function together in metabolic networks under the constraints of mass conservation, charge conservation, and thermodynamic considerations. Herein, co‐sets are defined as groups of enzymatic reactions that are perfectly correlated (correlation coefficient of 1). These co‐sets are often non‐obvious, as the reactions within a co‐set may not be adjacent on a network map. They represent mathematically functional modules of a network, and identify genes whose products are collectively required to achieve physiological states. Accordingly, perturbations affecting genes belonging to the same co‐set are expected to lead to similar functional consequences.
Classifying SNPs and co‐sets
Here we use co‐sets to seek dependencies among SNPs with causal implications on metabolic function, by grouping SNPs in proteins that catalyze different reactions, but are shared within the same co‐set. Of course, not all SNPs will affect protein function; however, as the goal of this systems‐based analysis is to study functional consequences of causal SNPs, it will be implicit throughout this work that all SNPs considered will only be those with causal implications on enzymatic function. Although the SNPs may affect different proteins with different catalytic activities, if the reactions are in the same co‐set, the phenotypic consequences of such SNPs are expected to be similar. One can classify a group of genes that encode members of a co‐set into three fundamental types (Figure 1). Type A describes a multimeric enzyme, where an SNP in any subunit of the multimer can thus result in the same phenotype. Type B represents a co‐set of reactions in a contiguous pathway and Type C co‐sets are formed by non‐contiguous reactions.
Disease‐associated SNP co‐sets in the mitochondria
We mapped the human mitochondrial metabolic co‐sets (Thiele et al, 2005) to various diseases in the Online Mendelian Inheritance in Man (OMIM; Hamosh et al, 2005) database and then identified those cases in which SNPs have been described in the literature (Figure 2). The succinate dehydrogenase (SDH) forms a Type A co‐set of genes. A series of SNPs in the different subunits of SDH have been found to have similar phenotypic consequences.
The genes that encode the enzymes leading to heme biosynthesis constitute a Type B co‐set (Figure 2). Many SNPs in this set of genes result in various manifestations of porphyria. There is a range of severity and symptoms for a given enzyme and across the different enzymes in this gene set. These variations may be attributable to the specific location of particular SNPs, the presence or absence of other SNPs across the genome, differential tissue expression, the specific metabolic by‐products that accumulate or diminish based on the specific reaction, or mitochondrial heteroplasmy.
A Type C co‐set is found in the urea cycle (Figure 2). There is clinical coherence between SNPs in three of the four reactions in this set. Type C co‐sets are perhaps the most interesting of the three classifications because they are the most non‐obvious; consequently, they may have the greatest effect on revising previous views of interactions and classifications of disease. Another particularly interesting case is the citrulline/ornithine co‐set. There is only SNP‐related disease information for one of the two reactions in the co‐set, the SLC25A15 transporter, whose deficiency results in the hyperornithinemia, hyperammonemia, and homocitrullinuria syndrome (HHH). Although SNP‐related diseases have not been described for the other reaction in the co‐set, overexpression of SLC25A2 can rescue patients with HHH due to SLC25A15 deficiency (Camacho et al, 2003). This example presents implications for therapeutic strategies in disease treatment with enzymes capable of binding a range of substrates. If two reactions with overlapping substrate utilization are in the same co‐set, then overexpression of one enzyme can compensate for the deficiency or lack of expression of the other.
The complete set of SNP‐associated co‐sets identified in the mitochondria can be found in Supplementary Tables 1–9 online. The majority of identified co‐sets are Type B. Although there is a significant amount of overlap between these co‐sets, there is variability among them. Indeed even for a particular disease type, there can be a remarkably broad range of resulting phenotypes. As referred to above, this can be due to a range of factors including differential expression and genomic differences in other regions of the genome. The appropriate manner to resolve many of these issues and to increase the predictive power of these approaches is to incrementally increase the level of detail by accounting for more detailed biological information such as intracellular regulation, intercellular interactions, and different tissue expression states.
Implications of SNP co‐set network analysis
There are two points worth highlighting in this new conceptual framework and the resulting analysis. First, the approach taken to network reconstruction is a ‘bottom‐up’ approach (Reed et al, 2006). In this approach, network reconstruction is based on documented physical interactions and biochemical knowledge, rather than inferred interactions from HT data sets. Such reconstructions are a biochemically, genomically, and genetically structured (BiGG) database that represents an integration of all of our knowledge about the network being analyzed (Reed et al, 2006). Consequently, the co‐set predictions made are a direct result of a network‐wide analysis reflecting fundamental properties of the reconstructed biochemical network. The use of co‐sets to detect functionally related reactions is but one approach to analyzing reconstructed biological networks (Papin et al, 2004) and a number of others are emerging (Hatzimanikatis et al, 2004; Price et al, 2004; Sauer, 2004; Borodina and Nielsen, 2005). This type of analysis of bottom‐up networks can be used in conjunction with top‐down analysis of HT data sets to help elucidate functional biological relationships.
Second, the ability to map similarly causal SNPs to co‐sets represents a new dimension in SNP analysis that is enabled by systems biology. Like ‘perfect‐proxy sets’ in the HapMap (Altshuler et al, 2005), co‐sets take a step beyond trying to track individual components independently, towards appreciating how a number of cellular components interact to produce biological functionality and thus how their malfunction to result in pathophysiological states can correlate. Therefore, by adopting a systems approach, we can hopefully begin to define some of the ‘general underlying principles’ of biological functions, and in doing so, can impact the classification of diseases, the mechanistic understanding of the genotype–phenotype relationship, and the potential identification of therapeutic targets and strategies for disease treatment.
This work was supported in part by an NIH Training Grant.
Supplementary Online Material [msb4100077-sup-0001.pdf]
- Copyright © 2006 EMBO and Nature Publishing Group