Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome‐scale metabolic network for this alga and devised a novel light‐modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light‐driven metabolism and quantitative systems biology.
Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence (Merchant et al, 2007), and gene functional annotation, we have reconstructed and experimentally validated the genome‐scale metabolic network for this alga, iRC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. iRC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments (Figure 1A). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). The lipid metabolic pathways of iRC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl‐chain positions; acyl‐chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model‐driven metabolic engineering efforts.
We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models (Figure 1B) and providing validating evidence for the contents of iRC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts detected, reflecting incomplete characterization of compartmental transporters and supporting a hypothesis of latent pathway evolution for ceramide synthesis in C. reinhardtii. Additional lines of evidence from the reconstruction effort similarly support this hypothesis including lack of ceramide synthetase and other annotation gaps downstream in sphingolipid metabolism. A similar hypothesis of latent pathway evolution was established for very long‐chain fatty acids (VLCFAs) and their polyunsaturated analogs (VLCPUFAs) (Figure 1C), owing to the absence of this class of lipids in previous experimental measurements, lack of a candidate VLCFA elongase in the functional annotation, and additional downstream annotation gaps in arachidonic acid metabolism.
The network provides a detailed account of metabolic photon absorption by light‐driven reactions, including photosystems I and II, light‐dependent protochlorophyllide oxidoreductase, provitamin D3 photoconversion to vitamin D3, and rhodopsin photoisomerase; this network accounting permits the precise modeling of light‐dependent metabolism. iRC1080 accounts for effective light spectral ranges through analysis of biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral ranges associated with each photon‐utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, termed prism reactions. Coefficients for different photon wavelengths in a prism reaction correspond to the ratios of photon flux in the defined effective spectral ranges to the total emitted photon flux from a given light source (Figure 3B). This approach distinguishes the amount of emitted photons that drive different metabolic reactions. We created prism reactions for most light sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs. We also included regulatory effects, resulting from lighting conditions insofar as published studies enabled. Light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii on multiple levels: transcriptional regulation, chloroplast RNA degradation, translational regulation, and thioredoxin‐mediated enzyme regulation. Through application of our light model and prism reactions, we were able to closely recapitulate experimental growth measurements under solar, incandescent, and red LED lights. Through unbiased sampling, we were able to establish the tremendous statistical significance of the accuracy of growth predictions achievable through implementation of prism reactions. Finally, application of the photosynthetic model was demonstrated prospectively to evaluate light utilization efficiency under different light sources. The results suggest that, of the existing light sources, red LEDs provide the greatest efficiency, about three times as efficient as sunlight. Extending this analysis, the model was applied to design a maximally efficient LED spectrum for algal growth. The result was a 677‐nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s, suggesting that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum.
In summary, the C. reinhardtii metabolic network iRC1080 that we have reconstructed offers insight into the basic biology of this species and may be employed prospectively for genetic engineering design and light source design relevant to algal biotechnology. iRC1080 was used to analyze lipid metabolism and generate novel hypotheses about the evolution of latent pathways. The predictive capacity of metabolic models developed from iRC1080 was demonstrated in simulating mutant phenotypes and in evaluation of light source efficiency. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling approach for light‐driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
The genome‐scale metabolic network of Chlamydomonas reinhardtii (iRC1080) was reconstructed, accounting for >32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways.
This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology.
Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in iRC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii, including very long‐chain polyunsaturated fatty acid and ceramide synthesis pathways.
A novel approach for modeling light‐driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
Algae have garnered significant interest in recent years for their potential commercial applications in biofuels (Hu et al, 2008; Hemschemeier et al, 2009) and nutritional supplements (Spolaore et al, 2006). Among eukaryotic microalgae, Chlamydomonas reinhardtii has arisen as the hallmark, model organism (Harris, 2001). C. reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001).
Commercial use and basic scientific research of photosynthetic organisms could benefit from better understanding of how light is absorbed and affects cellular systems. The quality of light sources implemented in photobioreactors largely determines the efficiency of energy usage in industrial algal farming (Fernandes et al, 2010). Light spectral quality also affects how photon absorption induces various metabolic processes: photosynthesis, pigment and vitamin synthesis, and the retinol pathway required for phototaxis.
Metabolic network reconstruction provides a framework to integrate diverse experimental data for investigation of global properties of metabolism, and as such, can provide clear advantages as a mode of studying the effects of light upon a photosynthetic biological system if light input is accounted for explicitly. The standardized reconstruction process (Thiele and Palsson, 2010) yields a biochemically and genomically structured knowledgebase and, coupled with the standard simulation approach of flux balance analysis (FBA) (Orth et al, 2010), provides a basis for predictive phenotype modeling; both contexts have been used for a variety of applications (Durot et al, 2009; Oberhardt et al, 2009; Gianchandani et al, 2010), among them the design of genetic engineering strategies for production strains (Bro et al, 2006; Park et al, 2011). To date, however, photon flux, with associated spectral constraints, has not been integrated into a metabolic network reconstruction.
Characterizing algal metabolism is key to engineering production strains and framing the study of photosynthesis. Extensive literature on C. reinhardtii metabolism, reviewed in Stern et al (2008), and multiple metabolic mutants (Harris et al, 2008) provide a solid foundation for detailed characterization of its metabolic functions. The availability of complete genome sequence data for C. reinhardtii (Merchant et al, 2007) and its functional annotation have enabled bioinformatic approaches to inform the presence of genome‐encoded enzymes (Grossman et al, 2007; Boyle and Morgan, 2009; Manichaikul et al, 2009). We have employed these resources to reconstruct and experimentally validate a genome‐scale metabolic network of C. reinhardtii, the first network to account for detailed photon absorption permitting growth simulations under different light sources. This network accounts for the activity of substantially more genes with metabolic functions than existing reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009).
Reconstruction contents and advances
The genome‐scale C. reinhardtii metabolic network (Figure 1A; Supplementary Figure S1; Supplementary Table S1; Supplementary Table S2; Supplementary Model S1) accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites, and encompasses 83 subsystems distributed across 10 compartments. As per convention (Reed et al, 2003), we call this network iRC1080 based on the primary reconstructionist and the scope of genomic content. Of the putative protein‐coding genes in the C. reinhardtii genome (http://augustus.gobics.de/predictions/chlamydomonas/augustus.u5.aa), an estimated 20% function in metabolism (Supplementary Table S3). iRC1080 accounts for the activity of >32% of the estimated genes with metabolic functions, a significant expansion over existing reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). iRC1080 is the most comprehensive metabolic network reconstruction of C. reinhardtii to date based on inclusion of pathways and a level of detail absent from previous reconstructions.
A major emergent feature of C. reinhardtii metabolism, apparent in Figure 1A, is the relative centrality of the chloroplast and its importance in light‐driven metabolism. The chloroplast, including the thylakoid and eyespot subcompartments, accounts for >30% of the total reactions in the network and 9 of the 10 photon‐utilizing reactions. The thylakoid contains essential pathways for photoautotrophic growth including photosynthesis, chlorophyll synthesis, and carotenoid synthesis, producing photoprotective pigments also valuable as fish feed additives and nutritional supplements for human consumption. The eyespot accounts for retinol metabolism, the mechanistic basis for phototaxis. Several pathways are partially duplicated across the chloroplast and other cellular compartments, in agreement with known biochemistry. A few crucial pathways are divided between the chloroplast and cytosol, including glycolysis and glycerolipid metabolism. Among the glycerolipids, triacylglycerides carrying high energy, long‐chain fatty acids relevant for biofuel production accumulate substantially in microalgae. iRC1080 provides a thorough resource for studying these and other metabolic products and a basis for strain design for genetic engineering.
iRC1080 considerably expands lipid metabolic pathways over previous reconstructions. We compared the lipid pathways of iRC1080 with several previously published metabolic reconstructions (Duarte et al, 2007; Feist et al, 2007; Boyle and Morgan, 2009; Mo et al, 2009; Montagud et al, 2010) counting the number of genes, reactions, and chemically distinct lipid molecules included in pathways for each lipid class (Table I). The extent of gene, reaction, and metabolite content of lipid pathways in iR1080 is, in general, greater than previous reconstructions. The coverage of ketoacyl lipid chemical properties represented in each network was also analyzed for all metabolites in fatty acyl, glycerolipid, glycerophospholipid, and sphingolipid classes; the fraction of lipid metabolites in the networks that account for a given applicable property was determined (Table I). Lower coverage signifies incompletely specified molecular species and often lumped lipid reactions and metabolites. iRC1080 accounts explicitly for all metabolites in these pathways, providing sufficient detail to completely specify all individual molecular species: backbone molecule and its stereochemical numbering of acyl‐chain positions; acyl‐chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. This level of detail enables a significantly higher degree of precision in lipid studies and in metabolic engineering design involving these pathways.
Experimental transcript verification
We have analyzed iRC1080 via experimental transcript verification under permissive growth conditions (Supplementary Table S4), representing the largest genome‐scale transcript validation effort to date. More than 75% of included transcripts were verified at >90% sequence coverage, and 92% of tested transcripts were at least partially validated experimentally (i.e. a portion of the sequence was recovered in the sequenced transcripts) (Figure 1B). We also analyzed the strength of transcript verification by specific metabolic subsystems (Figure 2, a representative subset; Supplementary Figure S2, the full set). The full lengths of all transcripts associated with 10 subsystems were verified, notably including biosynthesis of unsaturated fatty acids, histidine metabolism, and phenylalanine, tyrosine and tryptophan biosynthesis, with 12, 12, and 24 transcripts, respectively. Many more subsystems were also well verified, 61 out of 76 gene‐associated subsystems with >90% of associated transcripts at least partially validated. It should be noted that only sequencing reads that uniquely map to reference transcript sequences were counted toward the percentage of length validation; thus, sequencing reads unique enough to unambiguously specify the corresponding reference transcript were detected for every transcript with >0% validation. A few subsystems stood out as being more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts validated to any extent. This may reflect low expression level or imperfect structural annotation of these genes, particularly compartmental transporters. Low expression levels or complete deactivation of these genes is consistent with a hypothesized evolutionary trend (see below) in the case of sphingolipid metabolism.
Evolution of latent lipid pathways
The comprehensive reconstruction of lipid metabolism in iRC1080 revealed hypothetical latent pathways, the functions of which have likely been lost through evolution. Previous studies established that C. reinhardtii lacks the practically ubiquitous membrane lipids phosphatidylcholine (Giroud et al, 1988) and phosphatidylserine (Riekhof et al, 2005). Similarly, our reconstruction suggests that C. reinhardtii also lacks very long‐chain fatty acids (VLCFAs), their polyunsaturated analogs (VLCPUFAs) (Figure 1C), and ceramides.
Surveys of C. reinhardtii lipid species have not detected VLCFAs (Giroud et al, 1988; Giroud and Eichenberger, 1989; Tatsuzawa et al, 1996; Dubertret et al, 2002; Kajikawa et al, 2006; Lang, 2007), likely due to a lack of functional VLCFA elongase (Weers and Gulati, 1997; Guschina and Harwood, 2006; Kajikawa et al, 2006). No candidate VLCFA elongase was identified in our comprehensive functional annotation (Supplementary Table S3), and our annotation suggests several downstream gaps in arachidonic acid metabolism as well, corroborating this hypothesis. Arachidonic acid, the 20‐carbon parent fatty acid of all VLCFAs and VLCPUFAs, is synthesized by a VLCFA elongase‐catalyzed extension of γ‐linolenic acid, which is present in C. reinhardtii (Griffiths et al, 2000). Notably, C. reinhardtii does encode a fatty acid desaturase that accepts arachidonic acid as substrate (Kajikawa et al, 2006) and, based on our functional annotation, encodes several other enzymes that act upon this substrate, indicating that algal ancestors likely had a functional VLCFA elongase.
Multiple lines of evidence uncovered during the reconstruction also support the absence of ceramides in C. reinhardtii. Our functional annotation did not uncover a convincing candidate for ceramide synthetase (EC:18.104.22.168), a required enzyme for ceramide synthesis, nor, to our knowledge, has one been discovered by previous efforts, including C. reinhardtii enzyme annotations of the Kyoto Encyclopedia of Genes and Genomes. Similarly, our functional annotation suggests substantial gaps downstream in the sphingolipid metabolic pathway. As aforementioned, C. reinhardtii also lacks VLCFAs, and VLCFA‐CoA is a required substrate for the ceramide synthetase reaction (Hills and Roscoe, 2006). Finally, our experimental transcript analysis failed to verify 2 out of 8 transcripts associated with sphingolipid metabolism (Figure 2) that were included in iRC1080, 1 of 2 serine C‐palmitoyltransferases and a putative sphingosine 1‐phosphate aldolase. This result may reflect still further gene function loss in this pathway, perhaps occurring more recently in evolutionary time given that our functional annotation actually detected candidate sequences for these enzymes. Considering this evidence, we suggest that the evolutionary history of C. reinhardtii includes the loss of ceramide metabolism, although this hypothesis remains to be verified. Annotated enzymes in this pathway separated from the broader network by gaps may represent multifunctional proteins or proteins that have evolved to function in a pathway distinct from ceramide synthesis. These gaps in C. reinhardtii metabolism not only increase understanding of the evolution of algal lipid pathways but also represent potential targets for genetic engineering in an effort to expand the diversity of lipids this alga can synthesize. Such engineering efforts serve as valuable test cases for engineering industrial strains and could improve C. reinhardtii as a model alga for biofuel development.
Modeling metabolic light usage
Our reconstruction accounted for effective light spectral ranges by analyzing biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral bandwidths associated with each photon‐utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, which we term prism reactions. The coefficients for different photon wavelengths in prism reactions correspond to the ratios of photon flux in the defined effective spectral ranges to the total photon flux in the visible spectrum emitted by a given light source (Figure 3A and B). In this manner, it is possible to distinguish the amount of emitted photons that drive different metabolic reactions. We created prism reactions for 11 distinct light sources (Supplementary Figure S3), covering most sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs.
The network reconstruction provides a detailed account of metabolic photon absorption by light‐driven reactions. Photosystems I and II in iRC1080 stoichiometrically absorb photons according to the Z‐scheme (Berg et al, 2007). The light‐dependent protochlorophyllide oxidoreductases require a single photon per catalysis as demonstrated in wheat (Griffiths et al, 1996). Extrapolation of UVB energy requirements for spontaneous provitamin D3 conversion to vitamin D3 (Bjorn, 2007) based on the average photon energy in the UVB range suggests a stoichiometric ratio of approximately one. Two phototactic rhodopsins, reactants of the rhodopsin photoisomerase reaction, are encoded by C. reinhardtii, one requiring a single photon and one requiring two photons for activation; the average effective stoichiometric photon count was measured to be 1.6 (Hegemann and Marwan, 1988).
A prism reaction is the intermediate step between light input and the specific photon‐utilizing metabolic reactions mentioned above. Flux through the photon exchange reaction ‘EX_photonVis(e)’ represents the total metabolically active photon flux incident upon the cell. Flux passing through this exchange reaction then passes through a single user‐specified prism reaction, for example ‘PRISM_solar_litho,’ and is distributed across specific spectral ranges. These ranges are specified explicitly in the photon‐dependent metabolic reaction formulas (Supplementary Table S2), thereby making these reactions wavelength specific. Flux through the photon‐dependent metabolic reactions is then propagated through the network. Excess wavelength‐specific photon fluxes that are not absorbed metabolically leave the system via demand reactions, for example ‘DM_photon298(c),’ completing the pathway of light through the network.
To accurately model metabolic activity of a photosynthetic organism, it is also important to consider regulatory effects resulting from lighting conditions. Indeed, light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii at multiple levels: transcriptional regulation (Bohne and Linden, 2002), chloroplast RNA degradation (Salvador et al, 1993), translational regulation (Cahoon and Timko, 2000), and thioredoxin‐mediated enzyme regulation (Lemaire et al, 2004). As a preliminary attempt to incorporate light and dark regulatory effects, literature was reviewed to identify such regulation upon enzymes in iRC1080 (Supplementary Table S5), focusing mainly on thioredoxin regulation of chloroplast enzymes since most published data relate to this mode. In the absence of activity spectra for these effects, it is not yet possible to represent these effects via prism reactions. Therefore, we modeled regulation with Boolean reaction flux constraints following published approaches (Covert et al, 2001).
Environmental and genetic validation of iRC1080
Implementing light‐regulated constraints and basic environmental exchange constraints (Supplementary Table S6) yielded photoautotrophic, heterotrophic, and mixotrophic models from iRC1080. We simulated various growth conditions (Supplementary Table S7) and all gene knockouts for which phenotypes have been published and are assessable in our network (Supplementary Table S8) to validate the predictive ability of the models. All 30 validations involving environmental parameters displayed very close agreement with experimental results (Supplementary Table S7). Of particular note is the ability of our photosynthetic model in sunlight to accurately recapitulate O2‐PAR (photosynthetically active radiation) energy conversion efficiency, predicting an efficiency of 2% compared with the experimental result (Greenbaum, 1988) of 1.3–4.5%. Of the 14 gene knockouts simulated, 7 were partially or completely validated relative to experimental results (Supplementary Table S8). The unconfirmed gene knockout phenotypes may result from network errors or an incomplete set of constraints in the model (e.g. enzyme capacity, regulatory, thermodynamic, or other constraints). No internal model reactions were constrained in the models except indirectly via constraints on the input exchanges and the few explicitly noted Boolean regulatory constraints imposed (Supplementary Table S5). The unconfirmed knockout phenotypes were investigated through model analysis and literature search, although in most cases, current literature evidence could not completely explain these discrepancies, leaving them to be fully accounted for by future studies.
Two discrepancies may result from incomplete genome functional annotation or missing constraints. Knockout of mitochondrial NADH:ubiquinone oxidoreductase complex I (EC:22.214.171.124) in the model fails to recapitulate a reduced heterotrophic growth phenotype (Remacle et al, 2001a). The NDA2 and NDA3 genes can substitute completely for this activity in the current model. Sequence‐based localization analysis places both proteins in the mitochondria, but this may be incorrect as a recent study suggests that both may be plastid localized (Desplats et al, 2009). Two other network reactions can also substitute for the reduction of ubiquinone, succinate dehydrogenase (ubiquinone) (EC:126.96.36.199) and electron transfer flavoprotein‐ubiquinone oxidoreductase (EC:188.8.131.52). The cytochrome c oxidase complex IV (EC:184.108.40.206) knockout does not result in an obligate photoautotrophic phenotype (Remacle et al, 2001b) in the model because the cytochrome c peroxidase (EC:220.127.116.11) reaction is capable of compensating. The C. reinhardtii CCPR1 protein is homologous to mitochondrial cytochrome c peroxidases from a number of species, but no focused studies have been carried out to provide further evidence for this enzyme. In the model, the complex IV and CCPR1 double knockout is an obligate photoautotroph. These discrepancies point out important genes that should be the focus of subsequent experimentation in order to more clearly understand these metabolic phenotypes.
Another discrepancy may result from missing thermodynamic constraints. The zeaxanthin epoxidase (EC:18.104.22.168) knockout does not preclude antheraxanthin, violaxanthin, or neoxanthin production (Baroli et al, 2003) in the model because violaxanthin de‐epoxidase (EC:22.214.171.124) reactions compensate. This substitution depends on the reversibility of these de‐epoxidase reactions and may point to missing thermodynamic constraints or to undiscovered regulation under this condition.
Two discrepancies result from the lack of accounting for kinetics of the reactions of ribulose‐1,5‐bisphosphate carboxylase oxygenase (RuBisCO) from the model. Both phosphoglycolate phosphatase (EC:126.96.36.199) (Suzuki et al, 1990) and glycolate dehydrogenase (EC:188.8.131.52) (Nakamura et al, 2005) deficient mutants require high CO2 for photoautotrophic growth in vivo, not recapitulated in simulations. This phenotype results from dominance of the oxygenase over carboxylase activity of RuBisCO under lower CO2 conditions, both reactions sharing the same catalytic site. In vivo, these two mutants are deficient in the salvage of carbon from 2‐phosphoglycolate, a product of the oxygenase activity of RuBisCO. Although these two reactions are carried out by the same enzyme in the model, their fluxes are treated as independent and not competitive; due to an absence of kinetic parameters in the model, the effect of relative CO2 and O2 concentrations upon RuBisCO activity cannot be explicitly expressed. Because the carboxylase activity more efficiently promotes growth, both high and low CO2 conditions drive only this reaction and not the oxygenase reaction in the model; therefore, the salvage pathway is unnecessary in the model to achieve wild‐type growth rates.
Finally, two mutant phenotype discrepancies in the model result from complex compensatory pathways that convert an input carbon source to the mutant‐required carbon source. The high CO2 requirement for photoautotrophic growth due to knockout of the chloroplast carbonic anhydrase (EC:184.108.40.206) (Spalding et al, 1983; Funke et al, 1997) can be compensated for in the model by activity of a six‐reaction pathway of pyrimidine metabolism leading from bicarbonate incorporation via carbamoyl‐phosphate synthase (EC:220.127.116.11) to conversion to CO2 via orotidine‐5′‐phosphate decarboxylase (EC:18.104.22.168). The chloroplast ATP synthase (EC:22.214.171.124) deficient mutant (Smart and Selman, 1991; Dent et al, 2005; Drapier et al, 2007) with an acetate‐requiring phenotype can be compensated for in the model by a complex pathway consisting of >15 reactions by which CO2 is converted to acetate, which is then used in pathways similar to those supporting heterotrophic growth. Although this complex pathway has many branch points, it is notable that chloroplast malate dehydrogenase (EC:126.96.36.199) and the diffusion of pyruvate between the cytosol and chloroplast are essential to coupling the CO2 fixation reactions to pyruvate metabolism and ultimate conversion to acetate but are not essential to the wild‐type photoautotrophic or heterotrophic models. Loss of either of these conditionally essential reactions prevents the CO2‐to‐acetate conversion and recapitulates the acetate‐requiring phenotype. Given the complexity of these compensatory pathways, a number of possible missing constraints could explain their inactivity in vivo under photosynthetic conditions, and the model offers a starting point to explore possible targets of regulation under these conditions.
Gene essentiality analysis
To demonstrate the prospective use of iRC1080 in predicting phenotypic outcomes of genetic manipulations of C. reinhardtii, comprehensive essentiality analysis of all simulated single‐gene knockouts was performed in models under four basic environmental conditions: growth in sunlight with and without acetate, aerobic growth in dark on acetate, and anaerobic subsistence in dark on starch. Phenotypes were defined as growth equivalent to wild‐type, reduced growth relative to wild‐type, or lethal based on the comparative objective fluxes of the mutant and wild‐type models (Supplementary Table S9). A lethal phenotype was defined as no flux through the biomass reaction (defined as the objective function) in the mutant. Simulation results exhibited distinct metabolic system dependencies under each condition. There were 201 and 144 lethal knockouts in the model with sunlight and with and without acetate, respectively. There were 147 and only 3 lethal knockouts in the aerobic and anaerobic dark model, respectively. The metabolic processes associated with essential genes were ranked, and the three subsystems associated with the essential genes were compared under each condition. Photosynthesis, porphyrin and chlorophyll metabolism, and phenylalanine, tyrosine, and tryptophan biosynthesis were the most essential subsystems in light without acetate. Phenylalanine, tyrosine, and tryptophan biosynthesis, porphyrin and chlorophyll metabolism, and purine metabolism were the most essential subsystems in light with acetate. Expectedly, photosynthesis is most crucial for photoautotrophic growth and not required in the presence of acetate. The dark, aerobic condition had the same top ranked essential subsystems as in the mixotrophic condition, which is also expected as amino acids, chlorophyll, and nucleotides make up a high proportion of the required biomass components under both conditions. For subsistence in dark on starch, glycolysis/gluconeogenesis, starch metabolism, and starch and sucrose metabolism were the most essential subsystems, paralleling the expected core pathways for ATP maintenance with starch breakdown. While these predicted genotype–phenotype relationships demonstrate a compelling prospective use of the network, the majority of the mutant phenotypes remain to be validated experimentally; however, these predictions could be used to help define the scope and expected outcomes of such future studies.
Light‐source‐specific model validations
Next, we performed more extensive validations of light models grown under specific light sources at varying intensities. Varying sunlight intensity in our model and evaluating photosynthetic O2 evolution, we observed that the model reached photosynthetic saturation at light intensity consistent with experimental measurement (Polle et al, 2003) (Figure 4A). Our model under red LED light (653 nm) also showed fair agreement with our experimentally measured maximum growth rate across the range of unsaturated photon flux (Figure 4B), despite divergence above the experimental saturation point. The principal explanation for this divergence lies in the relative CO2 supplies of the experimental setup and the model. All reported photoautotrophic model simulations utilize the same maximum CO2 exchange constraint corresponding to the maximum‐measured cellular uptake rate under non‐CO2‐limiting conditions (Supplementary Table S6), while the CO2 supply in our bioreactor setup was clearly growth‐limiting given that the light‐saturated maximum growth rate was 0.01 gDW/h, much lower than the maximum growth rate of 0.14 gDW/h under non‐CO2‐limiting conditions (Janssen et al, 2000). It should also be noted that the linearity of the simulation trends is a property of steady‐state system modeling, which is incapable of kinetic representation of growth shifts observable in the in vivo experiments. For further validation, we present that the maximum biomass yield under incandescent white light is 5.7 × 10−5 gDW/mE (Janssen et al, 2000), in close agreement with our analogous prediction of 2.6 × 10−5 gDW/mE (Figure 4C). Similarly, our predicted biomass yield on 674 nm peak LED light of 1.1 × 10−4 gDW/mE is on the same order of magnitude as our experimental results for C. reinhardtii under 660 nm peak LED light near growth‐saturating photon flux, 4.3 × 10−4 gDW/mE. This agreement is striking given that the network explicitly accounts for the spectral photon flux of these light sources and the subsequent processing of this energy to generate all of the constituents of biomass without any parameter fitting to the experimental data. Together, these results constitute an important validation of our models using three different light sources.
To quantitatively evaluate the significance of the agreement between our reported model simulations using prism reactions derived through analysis of irradiance spectra and experimental measurements under the three light sources reported above, we compared the reported simulation results for each of these light sources with an unbiased sample of results representative of potential solutions achievable using our network. We sampled the space of possible light models by generating random prism reactions with the same total metabolically active photon flux. To obtain stoichiometric coefficients for a random prism reaction, a set of random fractions of the sum of stoichiometric coefficients of the prism reaction representing the evaluated light source was generated, contingent upon resulting in the same sum of coefficients. The simulations as reported above for sunlight, red LED, and white incandescent light were repeated using such random prism reactions. The Euclidean distance between the simulated and experimental results was compared with the distribution of distances for 10 000 randomly sampled results (Figure 5). The probability of randomly achieving experimental agreement closer than seen in our simulations was determined empirically based on these distributions of distances. Only 77 of 10 000 randomized simulations (Figure 5A) had experimental agreement better than the simulated oxygen photoevolution under sunlight (Figure 4A), yielding an empirical P‐value of 0.0077, and indicating our model had experimental agreement statistically significantly better than a random model constrained to have the same total metabolically active photon flux. Simulated growth under 665 nm peak LED (Figure 4B) had a suggestive P‐value of 0.1947 (Figure 5B), although the reported simulation was still closer to experiment than the mean of randomized simulations. Our simulated growth under white incandescent light was statistically significantly closer to experiment (Janssen et al, 2000) than random (Figure 5C) with a P‐value of 0.0285. This analysis shows that the reported model for each of these light sources is exceptionally close to recapitulating experimental results and thus serves as an excellent validation. These results indicate that the network has the capacity to broadly differentiate light‐dependent growth based on spectral properties and that the formulation of a prism reaction serves to accurately narrow the space of possible flux distributions relevant to a specific light source.
Application of iRC1080 to evaluate light source efficiency and design
Our photosynthetic model was applied prospectively to evaluate the efficiency of light utilization under different light sources. The photon energy conversion efficiency (Supplementary Equation 1) and biomass yield on light (Supplementary Equation 2) were computed for each light source given the minimum incident photon flux required to achieve maximum growth rate (Figure 4C); the minimum photon flux for maximum growth rate is the growth‐saturating photon flux value for a given light source. One clear result is that red LEDs provide the greatest efficiency in terms of both absorbed photon energy and biomass yield, about two and three times as efficient as can be optimally achieved in sunlight by these respective measures. Although experimental growth data for validation is only presented for three light sources, simulation results are presented for all 11 light sources for which irradiance spectra were obtainable (Figure 4C). This analysis demonstrates the prospective extensibility of the network and modeling approach to any possible lighting condition, natural or manmade, for which an irradiance spectrum can be measured.
Given the capability of our photosynthetic model to evaluate light source efficiency, we applied it to design an LED spectrum providing maximum photon utilization efficiency for growth (Supplementary Figure S3). The result was a 677‐nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s (Figure 4C; Supplementary Figure S3), which is quite close to the 674‐nm LED with a minimum incident photon flux of 362 μE/m2/s for maximum growth. This result suggests that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum, which is further supported by experimental measurements of the spectral peak of light absorption for green algae (Akkerman et al, 2002) and the quantum action spectrum of land plants (Barta et al, 1992) (Supplementary Table S7).
We have presented a genome‐scale network reconstruction of C. reinhardtii metabolism, well validated in content and function, and its application for detailed modeling of diverse light sources. Initial model validations also highlight the need for more experimental studies to uncover regulatory mechanisms in order to expand understanding of the complexity of light regulation of algal metabolism. This open research topic presents important challenges and opportunities in enumerating such effects on a genome scale.
Given the importance of lipid metabolism in biofuel production, iRC1080 was reconstructed enumerating all lipids supported by evidence in the literature and genome functional annotation. The capacity of iRC1080 as a knowledgebase was demonstrated through analysis of lipid metabolism to generate novel hypotheses about latent metabolic pathways resulting from algal evolution. In particular, the exclusion of certain enzymatic reactions in VLCFA and sphingolipid pathways from iRC1080 suggests evolutionary recession of these pathways in C. reinhardtii, a hypothesis supported by undetected lipids in experimental measurements, gaps in genome functional annotation for these enzymes, and incomplete transcript verification for other enzymes included in these pathways. Not only do these network gaps reflect the relatively simple lipid biosynthetic capabilities of C. reinhardtii among microalgae, but their identification suggests gene insertions that could expand its lipid metabolic repertoire, relevant for industrial and scientific purposes. Of particular interest may be the potential for enabling algal synthesis of essential fatty acids for human health such as docosahexaenoic acid (Yashodhara et al, 2009). Candidate enzymes for the conversion of arachidonic acid to essential fatty acids downstream of the apparently absent VLCFA elongase reaction are present in our functional annotation.
The models developed from iRC1080 provide a platform for prediction of phenotypic outcomes of system perturbations, light source evaluation and design, and genetic engineering design for production of biofuels and other commodity chemicals. We demonstrated an approach applying iRC1080 to the design of an energetically efficient light source for growth, a novel application of metabolic networks. Other light sources may be more efficient for other metabolic objectives or under other environmental conditions or genetic backgrounds. This result could be of significant interest to the metabolic engineering and bioreactor‐design communities because it demonstrates that our network and light‐modeling approach are capable of accurately predicting light source efficiencies in terms of a metabolic objective.
The prism reactions developed and applied in this study to quantitatively integrate spectral quality with biological activity represent a significant integration of diverse data types for biological system modeling, which hopefully will encourage a new paradigm for systems biology. This modeling approach could be used for applications beyond light source design, including as a metabolic basis for studying and simulating phototaxis. Given the acquisition of appropriate biological spectral activity data, this approach could be extended to other biological light‐response phenomena and other organisms. The importance of understanding how light parameters affect biological systems may also extend beyond natural phenomena with recent progress in protein engineering leading to chimeric light‐inducible proteins (Shimizu‐Sato et al, 2002; Levskaya et al, 2005).
The iRC1080 network and presented metabolic modeling represent a milestone in systems biology. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling of light‐driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
Materials and methods
Metabolic network reconstruction
Building from our previously published reconstruction of C. reinhardtii central metabolism (Manichaikul et al, 2009), iAM303, the iRC1080 network was reconstructed in a bottom–up manner according to current standards (Thiele and Palsson, 2010) on a pathway‐by‐pathway basis, drawing biochemical, genomic, and physiological evidence from >250 publications (Supplementary Table S2). The genomic evidence was derived from our own functional annotation (Supplementary Table S3) of metabolic enzymes, coenzymes, and transport proteins. Network gap‐filling was performed to make pathways functional and account for dead‐end metabolites. Global quality control checks were then performed, including elemental balancing and elimination of as many internal thermodynamically infeasible loops and new photon‐driven, input‐only pathways as possible (Supplementary Figure S4; Supplementary information). We also accounted for subcellular compartment pH in the protonation states of metabolites as much as possible.
Functional annotation of transcripts
Functional annotation for iRC1080 was performed using a consensus of two separate approaches. In the first approach, gene models (http://augustus.gobics.de/predictions/chlamydomonas/augustus.u5.aa) from the Augustus update 5 (Au5) of C. reinhardtii genome assembly version JGI v4.0 were functionally annotated by assigning enzyme classification (EC) terms using BLASTP results against UniProt (http://www.uniprot.org/) and AraCyc (http://www.arabidopsis.org/biocyc/) enzyme protein sequences and their EC annotations as the basis. The second approach followed from mapping of Au5 gene models to annotated JGI v3.1 gene models, for which EC terms and Gene Ontology annotation were assigned using a combination of BLASTP, AutoFACT, InterProScan, and PRIAM. The comprehensive annotation is presented in Supplementary Table S3. See Supplementary information for full details.
Simulation procedures consisted of FBA (Orth et al, 2010) and flux variability analysis (FVA) (Mahadevan and Schilling, 2003) as implemented in the COBRA toolbox (Becker et al, 2007), testing model capabilities while optimizing biomass functions to simulate growth (Supplementary Table S10) or subsistence on starch by optimizing ATP maintenance. FBA is a widely used simulation approach for large‐scale, constraint‐based metabolic models and has become a standard method in the systems biology field with a long history of success (Gianchandani et al, 2010). Different environmental conditions were modeled by appropriately setting reaction flux constraints in iRC1080 (Supplementary Table S6) including environmental exchanges, non‐growth associated ATP maintenance, O2 photoevolution, starch degradation, and light‐ or dark‐regulated enzymatic reactions (Supplementary Table S5).
C. reinhardtii strains and growth conditions
For transcript verification experiments, C. reinhardtii strain CC‐503 was grown in tris‐acetate‐phosphate medium containing 100 mg/l carbamicillin without agitation, at room temperature (22–25°C) and under continuous illumination with cool white light at a photosynthetic photon flux of 60 μE/m2/s.
For growth experiments under 660 nm peak LED light (Supplementary Figure S5), C. reinhardtii strain UTEX2243 was grown in a bubble column photobioreactor at 23–27°C with P49 medium. The total volume of algal culture was 300 ml, and the gas supply was 180 ml/min air with 2.5% CO2. The 660‐nm peak LED light supply was set at 10 kHz frequency and different duty cycles to get varied average photon fluxes.
Transcript verification by sequencing
ORF amplicons were generated from C. reinhardtii cells by RT–PCR from RNA or PCR from Gateway clones. The Roche 454FLX Titanium sequencing system was used for sequencing of the generated ORF amplicons according to the manufacturer's instructions. The generated data were processed using the GS FLX data analysis software v2.3. Minimum overlap length of 40 nucleotides and minimum overlap identity of 90% were used to align the sequencing reads against the Au5 reference sequences. ORFs encoding transporter proteins were verified by capillary Sanger sequencing.
Prism reaction derivation
Spectral bandwidths that effectively drive each photon‐utilizing reaction in iRC1080 were determined from published experimental activity spectral data or absorbance data. Effective spectral bandwidths were defined as the full width half maximum of activity, denoted by color‐paired dashed lines in Figure 3A. The effective spectral bandwidths were used to derive stoichiometric coefficients of the prism reactions used to quantitatively represent different light sources from the composition of their published irradiance spectra, converted to photon flux units according to Supplementary Equations 3 and 4. Coefficients for each of the effective spectral bandwidths were computed based on Equation 1.
Each coefficient represents the ratio of photon flux in the defined effective bandwidth to total visible photon flux. Definite integrals in Equation 1 were approximated using the trapezoidal rule. For each light source, all effective bandwidth coefficients were compiled into a single reaction in the form of Equation 2.
Constraints on prism reaction fluxes (Supplementary Table S6) were derived from the total visible photon flux, the definite integral of the spectrum from 380 to 750 nm. The total experimentally measured emitted visible photon flux was converted to model units of incident photon flux using the values in Supplementary Table S11 and Supplementary Equations 5 and 6. Prism reactions for 11 different light sources (Supplementary Figure S3) were generated.
Random sampling of prism reaction space and significance test
For a given prism reaction, first the sum of the stoichiometric coefficients was calculated, representing the total quantity of metabolically active photons per incident photon from the specified light source. Next, to sample the space of prism reactions, 10 000 random prism reactions with the same sum of stoichiometric coefficients were generated and used in growth simulations. In these simulations, input photon flux was constrained to the reported experimental values, generating a set of simulated results (biomass or photosynthetically evolved O2 flux, depending on the experimental parameter) with one value corresponding to each experimental data point. The Euclidean distance between the sampled and experimental results was calculated for each of the 10 000 randomized prism reactions (Figure 5). The significance of the experimental agreement with simulations reported for a given prism reaction derived directly from analysis of irradiance spectra was established by comparison between the corresponding Euclidean distance and the distribution of distances from the randomly sampled prism reactions. Probability of achieving equal or closer results to experiments by chance was computed as the proportion of smaller values in the randomly sampled distribution of 10 000 distances.
Procedure for efficient LED design
Multiple iterations of FVA were used to maximize growth while minimizing the energy of the sum of individual wavelengths of model photon flux. The ratios of these individual wavelength photon fluxes to total photon flux were set as stoichiometric coefficients for a theoretical maximum‐efficiency prism reaction. The Euclidean vector distance was computed (Supplementary Figure S6) between this set of coefficients and prism reaction coefficients calculated for an LED spectrum of the same shape as the experimentally measured 674 nm peak LED but centered at varying wavelengths across the visible spectrum, with a total photon flux equal to the total theoretical maximum‐efficiency photon flux. The spectrum corresponding to the minimum distance was taken as the solution and subsequently tested through growth simulation.
We thank Harish Nagarajan for his critical assessment of this work. This research was supported by the Office of Science (Biological and Environmental Research), US Department of Energy, Grant DE‐FG02‐07ER64496 (to JP and KS‐A), New York University Abu Dhabi Research funds (to KS‐A), and Institute Sponsored Research funds from the Dana‐Farber Cancer Institute Strategic Initiative (to CCSB). RLC was supported by the National Science Foundation IGERT training Grant DGE0504645, and EFYH was supported by the Jane Coffin Childs Memorial Fund for Medical Research.
Author contributions: RLC wrote the manuscript and developed the light‐modeling approach. RLC and AM reconstructed the metabolic network and performed model simulations and analysis. LG, YS and TH performed transcript verification experiments and associated computational analyses. EFYH and SB performed genome functional annotation. WF performed bioreactor growth experiments. B∅P, JAP, and KS‐A designed the study.
Supplementary Methods, Supplementary Figures S4–6, Supplementary Table S1 [msb201152-sup-0001.doc]
Supplementary Figure S1
High‐resolution network diagram. This network diagram displays all metabolites (nodes) and reactions (complex edges) included in iRC1080. Metabolites are color‐coded based on compartment localization. Reversibility and irreversibility are represented by the placement of arrows at the ends of edges connecting metabolites (i.e. substrates of irreversible reactions do not have arrows on the edges connecting to them). Labels follow the abbreviation conventions used in iRC1080 (Supplementary Table S2). Visually resolving node and edge labels requires zooming to at least 6400% on most displays. [msb201152-sup-0002.pdf]
Supplementary Figure S2
Complete transcript verification by subsystem. The transcript verification status for all gene‐associated subsystems of iRC1080 is displayed. The graph follows the same format as Figure 2. [msb201152-sup-0003.jpg]
Supplementary Figure S3
Complete activity and irradiance spectra. The irradiance spectra are shown for all light sources used in this study. The graphs follow the same format as Figure 3A. [msb201152-sup-0004.jpg]
Supplementary Table S2
Complete iRC1080 network data. All data included in the iRC1080 network are presented here including but not limited to lower and upper flux bounds (LB and UB, respectively) and all literature references used during the reconstruction. The flux bounds highlighted in yellow represent model input and output parameters that were varied when performing the various simulations presented in this study, and the values in these columns only represent defaults and not the actual values used in any specific simulation. [msb201152-sup-0005.xls]
Supplementary Table S3
Transcript functional annotation. The full set of metabolic functional annotation of transcripts is presented, including but not limited to those functions and transcripts represented in iRC1080. The rightmost column denotes the method by which the annotation was obtained: 1 corresponds to the first method we implemented for annotation, 2 corresponds to the second method we implemented for annotation, 3 corresponds to both methods, TCDB signifies annotation resulting from a bidirectional BLAST search comparing TCDB to the C. reinhardtii genome (JGI v4.0), and literature references signify those annotations taken directly from published literature. [msb201152-sup-0006.xls]
Supplementary Table S4
Transcript verification status. The percentage of sequence length verification is given for all experimentally tested transcripts included in iRC1080 (see Transcript verification experiments in the Supplementary Methods for more details). Those transcripts verified by capillary Sanger sequencing are denoted by an asterisk. [msb201152-sup-0007.xls]
Supplementary Table S5
Light and dark‐regulated reaction constraints. Reaction regulation under light and dark conditions, used in this study for constraint‐based modeling, are summarized. [msb201152-sup-0008.xls]
Supplementary Table S6
Basic modeling constraints. Precise values of parameters used to constrain models in simulations are presented. [msb201152-sup-0009.xls]
Supplementary Table S7
Environmental validations. Outcomes of simulations are compared to experimentally measured results to validate model function with respect to environmental parameters. [msb201152-sup-0010.xls]
Supplementary Table S8
Genetic validations. Outcomes of knock‐out simulations are compared to the analogous experimentally measured mutant phenotypes to validate model function with respect to genetic parameters. [msb201152-sup-0011.xls]
Supplementary Table S9
Gene‐knockout lethality. Growth or subsistence, in the case of anaerobic dark growth, phenotypes of all single‐gene knockout simulations are presented. Phenotypes are classified in relation to the objective flux achieved relative to the wild type simulation: wild type phenotype (WT), reduced relative to wild type phenotype (R), and lethal (L). [msb201152-sup-0012.xls]
Supplementary Table S10
Biomass functions. Stoichiometric coefficients for all biomass components accounted for to simulate growth in this study are presented under three growth conditions: photoautotrophic, heterotrophic, and mixotrophic. [msb201152-sup-0013.xls]
Supplementary Table S11
Constants for calculations. The basic cellular parameters assumed for calculations in this study and the derived conversion factors (see Dimensional and effective photon flux conversion factor derivation in the Supplementary Methods) are presented. [msb201152-sup-0014.xls]
SBML Model [msb201152-sup-0015.zip]
↵† RL Chang led the network reconstruction and computational modeling; L Ghamsari led the transcript verification
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2011 EMBO and Macmillan Publishers Limited