The initial genome‐scale reconstruction of the metabolic network of Escherichia coli K‐12 MG1655 was assembled in 2000. It has been updated and periodically released since then based on new and curated genomic and biochemical knowledge. An update has now been built, named iJO1366, which accounts for 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. iJO1366 was (1) updated in part using a new experimental screen of 1075 gene knockout strains, illuminating cases where alternative pathways and isozymes are yet to be discovered, (2) compared with its predecessor and to experimental data sets to confirm that it continues to make accurate phenotypic predictions of growth on different substrates and for gene knockout strains, and (3) mapped to the genomes of all available sequenced E. coli strains, including pathogens, leading to the identification of hundreds of unannotated genes in these organisms. Like its predecessors, the iJO1366 reconstruction is expected to be widely deployed for studying the systems biology of E. coli and for metabolic engineering applications.
A common denominator for systems biology studies of a target organism is a high‐quality genome‐scale metabolic network reconstruction. A network reconstruction represents a biochemically, genetically, and genomically structured knowledgebase that contains detailed information about an organism in a structured format (Thiele and Palsson, 2010). Metabolic network reconstructions contain information such as exact stoichiometry of metabolic reactions, chemical formulas and charges of metabolites, and the associations between genes, proteins, and reactions. These reconstructions form a basis for the formulation of mechanistic, and thus computable, genome‐scale genotype–phenotype relationships (Palsson, 2009).
The most detailed and complete metabolic reconstruction of any organism to date is for the common laboratory strain Escherichia coli K‐12 MG1655. The first genome‐scale reconstruction of E. coli was iJE660 (Edwards and Palsson, 2000). This network was constructed through extensive searches of literature and databases to ensure correct stoichiometry and cofactor usage, and was the most extensive metabolic network reconstruction in existence at that time. An updated version of this reconstruction, iJR904 (Reed et al, 2003), had an expanded scope, including pathways for the consumption of alternate carbon sources and more specific quinone usage in the electron transport system. Hundreds of new genes and reactions were added, gene–protein–reaction associations (GPRs) were included for the first time to connect reactions with genes, and all reactions were elementally and charged balanced through the inclusion of protons. In the next update, iAF1260 (Feist et al, 2007), the scope of the network was expanded again, now including many reactions for the synthesis of cell wall components, and all metabolites were assigned to the cytoplasm, periplasm, or extracellular space. The thermodynamic properties of each reaction were calculated, and this was used to set lower bounds on predicted irreversible reactions. iAF1260 contained 2077 reactions, 1039 metabolites, and 1260 genes. A core model version of iAF1260, useful for testing and debugging new constraint‐based algorithms and for educational use, has also been published (Orth et al, 2010a).
Here, we present an updated version of the E. coli metabolic network reconstruction. This new version, titled iJO1366, includes many newly characterized genes and reactions. Since the iAF1260 model was a very complete representation of the known metabolism of E. coli, only minor expansions in the scope of the network were made. Still, new discoveries since 2007 have made this model update necessary. Several genes were added based on the results of an experimental screen of E. coli knockout strains in four different media conditions. The gaps in the iAF1260 network were identified and characterized, and new reactions and genes were added to reduce the total number of gaps. The iJO1366 reconstruction can serve as a basis for metabolic network reconstructions of other E. coli strains and closely related organisms. We assembled preliminary reconstructions for many other E. coli strains, and analysis of their completeness provides insights into the metabolic networks of these organisms. iJO1366 is the most complete E. coli metabolic reconstruction to date, and like its predecessors, it will likely aid in many new discoveries (Feist and Palsson, 2008).
Results and discussion
Process for updating the reconstruction and its content
The updated network reconstruction of E. coli K‐12 MG1655 began with the iAF1260b network (Feist et al, 2010), a slightly updated version of the iAF1260 network. In order to identify incorrect model predictions in order to improve the E. coli reconstruction, we experimentally determined conditional essentiality for most of the genes in the iAF1260b model. By comparing model predicted growth phenotypes to the measurements, errors in the reconstruction were found and several updates were made (Supplementary Table 1). For a discussion of the updates made to iAF1260b based on this screen, see Experimental phenotypic screens in the Supplementary Information. Next, literature and database searches were used to add newly characterized genes and reactions since 2007. The EcoCyc (Keseler et al, 2009) and KEGG (Kanehisa et al, 2010) databases were used extensively for this purpose. Results from the experimental screen described above also led to several model updates. After this first round of updates, the reconstruction contained 1274 genes. The network gaps (Orth and Palsson, 2010) in this version of the reconstruction were then investigated using a modified version of the GapFind algorithm (Satish Kumar et al, 2007). All orphan reactions (reactions without known associated genes) in the reconstruction were also identified from the model GPRs. Gaps were manually sorted into scope and knowledge gaps. Scope gaps are metabolites that are blocked in a model due to the limited scope of the network reconstruction, but have actual known producing and consuming reactions. Knowledge gaps exist because our knowledge of any metabolic network is incomplete. Targeted literature and database searches were performed for each knowledge gap to try to identify any known metabolic reactions missing from the reconstruction. We continued to add newly published metabolic information to the reconstruction during this gap‐filling process, and the reconstruction was ultimately updated to iJO1366. All manual curation followed an established protocol (Thiele and Palsson, 2010).
iJO1366 represents a significant expansion of the E. coli reconstruction, as it contains 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. A comparison of the content of iJO1366 and its predecessor, iAF1260, is presented in Table I. Like iAF1260, iJO1366 contains a wide range of metabolic functions (Figure 1). The complete lists of reactions and metabolites in iJO1366 can be found in Supplementary Tables 2 and 3, with a list of all references used in Supplementary Table 4. iJO1366 accounts for three cellular compartments: the cytoplasm, periplasm, and extracellular space. In total, 107 new genes were added to the reconstruction, while one gene, prpE (b0335), was removed. Most new genes added have been characterized since iAF1260 was published in 2007 (Figure 1D). The fact that some references predate previous versions of the E. coli reconstruction does not necessarily mean that they were previously missed. Rather, as genes and reactions are often added on a pathway basis, complete functional pathways are typically fully elucidated over time from multiple sources. The new genes mostly add new pathways and systems to the network, but a significant number of them fill gaps and orphan reactions in existing systems (Figure 1E). A complete list of all new and removed genes, reactions, and metabolites can be found in Supplementary Table 5. The ‘core’ and ‘wild‐type’ biomass reactions of iAF1260 have also been updated in iJO1366. These are reactions that drain biomass precursor compounds in experimentally determined ratios to simulate growth (Feist and Palsson, 2010). For the complete core and wild‐type biomass reactions see Updating the biomass composition and growth requirements in the Supplementary Information and Supplementary Table 6. The knowledge index (number of abstracts in Medline) of the 1366 genes in the network was computed, and indicates that iJO1366 contains most of the best‐characterized genes in E. coli (Knowledge index of iJO1366 genes in the Supplementary Information and Supplementary Table 7). iJO1366 was also compared with an automatically generated E. coli reconstruction from the Model SEED (Henry et al, 2010) (Comparison of iJO1366 to the Model SEED E. coli reconstruction in the Supplementary Information and Supplementary Table 8) and to the protein localization database EchoLocation (Horler et al, 2009) (Comparison of iJO1366 to the EchoLocation database in the Supplementary Information and Supplementary Table 9).
A significant number of the gaps in the iAF1260 network were filled during the update to iJO1366, and several blocked pathways were unblocked. Several different types of gaps in metabolic networks are possible. Root no‐production gaps are metabolites with consuming reactions but no producing reactions. Root no‐consumption gaps are metabolites with producing reactions but no consuming reactions. Downstream gaps are metabolites with producing and consuming reactions but which are unable to be produced at steady state because they are downstream of a root no‐production gap. Similarly, upstream gaps are upstream of root no‐consumption gaps. The final reconstruction contains 48 root no‐production gaps, 63 root no‐consumption gaps, 52 downstream gaps, and 69 upstream gaps. In total, 11.5% of the metabolites in iJO1366 are blocked under all conditions due to gaps (Gaps and orphan reactions in the iJO1366 reconstruction in the Supplementary Information and Supplementary Table 10). The orphan reactions in models such as iJO1366 can also help to identify the functions of metabolic genes. In the original iJR904 study (Reed et al, 2003), gene homology was used to predict the likely E. coli genes that encode the enzymes for 56 orphan reactions. Since then, 14 of these predictions have been independently confirmed to be correct (Supplementary Table 11), and these genes are now included in iJO1366.
Prediction of metabolic phenotypes
Flux balance analysis (FBA) (Orth et al, 2010b) can be used with a constraint‐based model to predict metabolic flux distributions, growth rates, substrate uptake rates, and product secretion rates. The iAF1260 model and its predecessors were already very accurate at making phenotypic predictions such as growth rates and central metabolic flux distributions, so improved predictive capabilities in these areas were not expected with iJO1366. Instead, the value of the updated model is in its ability to predict phenotypes under a wider range of conditions than its predecessors.
To demonstrate the utility of the iJO1366 model in making these phenotypic predictions, we generated two large‐scale sets of model phenotype predictions. First, the growth phenotypes of E. coli on all possible carbon, nitrogen, phosphorus, and sulfur sources were predicted. The numbers of growth‐supporting substrates are summarized in Table II, and the full results of this screen given in Supplementary Table 12 and discussed in Prediction of all growth‐supporting carbon, nitrogen, phosphorus, and sulfur sources in the Supplementary Information. We also performed a screen of model predicted growth phenotypes for all possible single gene knockout strains. Growth phenotypes were predicted on both glucose and glycerol minimal media, and the results were compared with experimental data sets (Table III; Supplementary Table 13). Not unexpectedly, iJO1366 is slightly less accurate at predicting overall gene essentiality than iAF1260. This difference is due to the fact that the 107 new genes added to this model version are from less well‐studied systems and pathways than the existing genes in iAF1260, as discussed more thoroughly in Prediction of gene essentiality in the Supplementary Information.
Mapping iJO1366 to closely related strains
Although iJO1366 is a model of E. coli K‐12 MG1655, gene homology mapping can be used to create models of other E. coli and Shigella strains. To date, the metabolic reconstruction of E. coli W (Archer et al, 2011) is the only published reconstruction for E. coli strains other than K‐12. The iJO1366 reconstruction should prove useful for the study of other recently sequenced E. coli strains. A previous analysis of multiple E. coli genomes showed that there is a moderate level of variability with respect to metabolic gene content within the species (Vieira et al, 2011). This analysis went one step beyond genetic conservation, also investigating the conservation of network topology, but it stopped short of computing the capacity to carry flux through specific growth‐supporting pathways.
While it is known that equivalent function is not guaranteed by gene homology, it is still one of the most commonly used and effective methods of genome annotation (Frazer et al, 2003). In addition, the combination of sequence homology with constraint‐based analysis of metabolic networks can be used to determine the most likely metabolic gene content of an organism by generating models that match known biology. Using this approach, we predicted with FBA whether metabolic models based on iJO1366 for 38 E. coli and Shigella strains are capable of producing biomass on glucose minimal media at various conservation thresholds (Figure 2A). At each percent identity (PID) cutoff, the set of genes that are absent from iJO1366 was determined using Smith–Waterman alignments. The set of missing genes was then used to impose constraints on metabolic reactions utilizing the iJO1366 GPRs. As expected, the results show that all strains were capable of producing biomass with no conservation requirement, but many strains lost this ability as the requirement was made more stringent. At a PID of 40%, only four strains were incapable of producing biomass. This PID was used for further analysis, and was justified through network‐based analysis of auxotrophies (Figure 2B and Mapping iJO1366 to closely related strains in the Supplementary Information). Hundreds of unannotated genes in various E. coli and Shigella strains were identified through comparisons to iJO1366 genes, and are listed in Supplementary Table 14.
By analyzing the genetic content of 38 E. coli and Shigella genomes, 1006 metabolic genes in iJO1366 were found to be common to all genomes. The average genome in this analysis contains ∼97% of the genes in iJO1366 (Figure 2C). The mapping procedure described here led to the creation of 38 strain‐specific models. It is important to note that these models are not yet on the same level as typical ‘draft’ metabolic models, as new metabolic functions (functions beyond those occurring in E. coli K‐12 MG1655) have not yet been considered. Because only the 1366 metabolic genes of iJO1366 were considered for this mapping, the common set of genes (74%) appears larger than in the previous study by Vieira et al, which considered a set of 1545 reactions. This difference is also partly due to the fact that Vieira et al added orphan reactions to their E. coli networks and compared metabolic content on a reaction basis, rather than on a gene basis.
Although the metabolism of E. coli has been the subject of active research for decades, the new content added to the iJO1366 reconstruction is a result of the new discoveries that continue to be made. In fact, most of the newly characterized genes in iJO1366 were actually characterized in 2010 (Figure 1D). Although this is a small sample size, it appears that the pace of new discoveries is not slowing as more of the metabolic network is uncovered. There are still numerous uncharacterized E. coli K‐12 MG1655 genes, many of which are predicted to have metabolic functions (Riley et al, 2006). As more discoveries are made in the future, additional updates to the E. coli network reconstruction will need to be made. Future increases in the scope of the E. coli metabolic network reconstruction will likely include the integration of this network with reconstructions of other cellular systems such as transcription and translation (Thiele et al, 2009) or transcriptional regulation (Covert et al, 2004). iJO1366 is the most advanced and comprehensive metabolic reconstruction of any microorganism to date, and can thus continue to serve as a basis for the metabolic reconstruction of other bacteria. Based on the success of its predecessors, we expect that iJO1366 will be an important tool in microbial systems biology for years to come.
Materials and methods
Experimental phenotypic screens
The iAF1260 E. coli K‐12 MG1655 metabolic reconstruction was used to make computational predictions. The parent strain of the Keio Collection, BW25113, is derived from K‐12 MG1655 and is missing several genes that are present in K‐12 MG1655: araBAD, rhaBAD, and lacZ. Therefore, flux through the associated reactions without isozymes (ARAI, RBK_L1, RMPA, LYXI, RMI, RMK, and LACZ) was constrained by setting the upper and lower flux bounds of these reactions to zero. For aerobic growth, oxygen uptake was allowed by setting the lower bound of the oxygen exchange reaction to −18.5 mmol gDW−1 h−1. Anaerobic growth was modeled by setting the lower bound of this reaction to zero. For growth on glucose, the lower bound of the glucose exchange reaction was set to −8 mmol gDW−1 h−1. For growth on l‐lactate and succinate, the lower bounds were set to −16 mmol gDW−1 h−1. These are arbitrary bounds based on typical substrate uptake rates. After setting the bounds for each condition, the predicted effect of the deletion of each gene in iAF1260 for each condition was computed using the singleGeneDeletion COBRA Toolbox function (Becker et al, 2007), which uses GPRs to constrain the appropriate reactions to carry zero flux and then predicts maximum growth using FBA. A gene was considered essential for the simulated condition if deletion of the gene reduced optimal growth rate to <5% of the wild‐type strain as computed by FBA. When a strict threshold of zero growth was used to determine essentiality instead, only one additional essential gene was predicted: lldD (b3605) on lactate minimal medium.
Knockout strains were taken from the Keio Collection (Baba et al, 2006; Yamamoto et al, 2009) (supplied by Open Biosystems), a genome‐scale collection of E. coli K‐12 gene knockouts. Stocks of knockout mutants were streaked onto LB agar with kanamycin (50 μg ml–1) from odd‐numbered Keio Collection plates only. Two single colonies of the Keio knockout strain were inoculated into 96‐well plates containing 200 μl of LB media with kanamycin (50 μg ml–1) and incubated overnight at 37 °C without shaking. Plates were then centrifuged and pelleted cells were washed twice with 200 μl of 1 × M9 salts per well. Disposable replicator pins were used to transfer cells from the preculture plate to four new plates, two containing glucose M9 minimal medium, one containing lactate M9 minimal medium, and one containing succinate M9 minimal medium. There was 200 μl of minimal medium per well, and the minimal media also contained 50 μg ml–1 kanamycin. The 96‐well plates were covered with Aeraseal breathable film (Sigma) to minimize cross‐well contamination. For aerobic conditions, the plates were incubated at 37 °C without shaking in a sterile cabinet. For anaerobic conditions, the plates were incubated at 37 °C in an anaerobic chamber ([O2] <50 p.p.m.). After 48 h, absorbance at 600 nm of each well was determined using a VERSAmax microplate reader (Molecular Devices, Sunnyvale, CA).
A well was considered to have no growth or slow growth if its absorbance was less than a cutoff value specific to each of the four conditions. The cutoff value was determined by visual inspection of a histogram of absorbance values for all wells measured from a given condition. ‘Normal’ growers are supposed to result in measurements lying in the roughly Gaussian‐shaped distribution that makes up most of the data, while slow‐ and non‐growers are supposed to result in measurements falling outside the upper 0.95 area of the Gaussian distribution. The cutoffs were OD600=0.26 for glucose aerobic plates, OD600=0.21 for glucose anaerobic plates, OD600=0.21 for lactate aerobic plates, and OD600=0.20 for succinate aerobic plates. If both colonies of a gene knockout were determined to have normal growth, then the gene was considered a true positive (TP) if the model had predicted growth for the knockout, or a tentative false negative (TFN) if the model indicated the knockout should not have grown or previous experiments in glucose MOPS minimal medium had indicated the gene was essential (Baba et al, 2006). Similarly, if both colonies of a gene knockout were determined to have slow/no growth, then the gene was considered a true negative (TN) if the model predicted the same outcome, or a tentative false positive (TFP) otherwise. A gene was considered inconclusive (INC) if for any condition only one colony showed slow/no growth.
A second round of screening was performed for TFN, TFP, and INC gene knockouts. TFP and INC gene knockouts were rescreened with two colonies each; TFN gene knockouts were rescreened with four colonies and their pellets were washed four times before transfer to minimal media instead of twice to ensure that growth was not due to contamination from trace amounts of rich media from the precultures. For TFP and INC gene knockouts, a gene was considered essential for a condition if at least one colony showed slow/no growth in the condition in both the first and second round screens. If an essential gene was predicted by the model to be non‐essential in the experimental condition, then the gene was concluded to be a genuine false positive (FP) for that condition. For TFN gene knockouts, a gene was considered a genuine false negative (FN) if two or more colonies demonstrated normal growth in the secondary screen.
Metabolic network reconstruction procedure
The iJO1366 reconstruction was assembled by updating the iAF1260b E. coli metabolic reconstruction (Feist et al, 2010), an updated version of the iAF1260 reconstruction (Feist et al, 2007). iAF1260b contains six additional reactions that were not in iAF1260 (ALAt2rpp, ASPt2rpp, CITt3pp, DHORDfum, GLYt2rpp, and MALt3pp). A 96‐step procedure for metabolic network reconstruction was recently published (Thiele and Palsson, 2010), and the appropriate steps were followed when adding new genes, reactions, and metabolites to form iJO1366. The reconstruction was assembled using the SimPheny (Genomatica Inc., San Diego, CA) software platform. All new metabolites were checked against public databases (KEGG, PubChem) for correct structure and charge at a pH of 7.2. New reactions were mass and charge balanced and reversibility was assigned based on experimental studies, thermodynamic information, or the heuristic rules in the standard reconstruction protocol (Thiele and Palsson, 2010). Reactions were associated with genes and functional proteins to form GPRs. The iJO1366 model was exported from SimPheny as an SBML file and the COBRA Toolbox (Becker et al, 2007), a Matlab (The MathWorks Inc., Natick, MA) Toolbox, was used for additional model testing. The Tomlab (Tomlab Optimization Inc., Seattle, WA) CPLEX linear programming solver was used for all optimization procedures.
The GapFind MILP algorithm (Satish Kumar et al, 2007) was encoded in the COBRA Toolbox and used to identify all blocked metabolites in the iAF1260 and iJO1366 models. This algorithm was modified from the published version. Specifically, an option was included to change the mass balance constraint to , allowing for metabolites without consuming reactions to be identified as gaps. For each GapFind run, the lower bounds of all exchange reactions were set to −1000 mmol gDW−1 h−1 and the upper bounds of all model reactions were set to 109 mmol gDW−1 h−1. The GapFind algorithm was then run twice, once with each mass balance constraint option. Root no‐production and no‐consumption metabolites were identified from the model stoichiometric matrix (S) by searching for rows containing only negative or positive coefficients, respectively. Downstream no‐production and upstream no‐consumption gaps were identified by removing the root gaps from the GapFind outputs. The root gaps of each downstream gap were identified through targeted computational experiments in which metabolite source reactions were added to the network to restore connectivity. Orphan reactions were identified as all reactions without associated GPRs.
The core and wild‐type biomass reactions were modified from the iAF1260 biomass reactions. Biotin was added with a coefficient based on a published biotin concentration of 250 molecules per cell (Delli‐Bovi et al, 2010), while the related cofactor lipoate was added with a similar number of molecules per cell assumed. Iron–sulfur clusters were added with coefficients based on the predictions that 5% of all E. coli proteins contain these clusters (Fontecave, 2006), and that the majority (90%) of these clusters are of the [4Fe–4S] type. The coefficient of Fe2+ was decreased to account for Fe used in iron–sulfur clusters. Molybdenum cofactors were added with coefficients based on the measurement that the inorganic ion content of E. coli is 0.80% Mo (Cvetkovic et al, 2010), and the fact that the majority of this Mo is in the bis‐molybdopterin guanine dinucleotide form (85%). The coefficients of Cu, Mn, Zn, Ni, and Co were also adjusted based on recent in vivo measurements (Cvetkovic et al, 2010). All other biomass components remain the same as in iAF1260. Growth‐associated and non‐growth‐associated maintenances values were recalculated based on recent E. coli K‐12 MG1655 chemostat data for growth on glucose minimal media (Taymaz‐Nikerel et al, 2010). The slope and intercept of this experimental data were identified by linear regression. For these calculations, the electron transport system NADH dehydrogenase reactions NADH16pp (nuo) and NADH5 (ndh) were constrained to carry identical fluxes by replacing these reactions with an equivalent ‘flux split’ reaction, in order to constrain the model to a realistic P/O ratio of 1.375 (Noguchi et al, 2004). The NGAM of 3.15 mmol ATP gDW−1 h−1 was identified by FBA as the maximum amount of ATP produced at a glucose uptake rate of 0.17 mmol gDW−1 h−1, the intercept of the experimental data. The GAM of 53.95 mmol ATP gDW−1 was identified by FBA as the value that would give the correct experimentally determined slope of 10.83 mmol glucose gDW−1 h−1/(μ) h−1 when using the core biomass reaction.
The iJO1366 model is available in SBML format at BioModels (accession: MODEL1108160000) and as Supplementary Model 1.
Comparison of iJO1366 to the Model SEED E. coli reconstruction
The E. coli K‐12 MG1655 model Seed83333.1 V20.21 was downloaded from the Model SEED database (Henry et al, 2010) in SBML format. The set of 1139 genes in this model was compared by gene ID (b‐number) to the 1366 genes in iJO1366 to identify common genes and the unique genes in each model. The genes in the Model SEED model that were not in iJO1366 were then investigated one at a time. EcoCyc was used to identify gene functions, and several genes with verified metabolic functions were then added to iJO1366.
Comparison of iJO1366 to the EchoLocation database
Predicted and experimentally determined protein location data were obtained for all E. coli K‐12 genes from the EchoLocation database (Horler et al, 2009). The cellular locations of the proteins associated with the 1366 model genes from this database were then compared, one at a time, to the compartments of the metabolites in the reactions associated with each gene in iJO1366. For each location in the EchoLocation database, a Boolean rule was written to determine if the location is consistent with the associated model metabolites. For example, ‘Cytoplasmic’ proteins are consistent with genes only associated with cytoplasmic metabolites, and are inconsistent with genes associated with any periplasmic or extracellular metabolites. See Supplementary Table 9 for the full list of Boolean rules. The genes whose locations were inconsistent with model metabolites were then investigated individually, except for ‘Cytoplasmic’ and ‘Periplasmic’ genes with both cytoplasmic and periplasmic metabolites in the model, because there were too many of these inconsistencies to investigate manually. Literature and database information was used to determine whether the EchoLocation database, the iJO1366 reconstruction, or both were correct.
The iJO1366 model, constructed in SimPheny, was exported as an SBML file and used to perform simulations and constraint‐based analyses using the COBRA Toolbox and Tomlab CPLEX linear programming solver. The constraint‐based model consists of an S matrix with 1805 rows and 2583 columns, where 1805 is the number of distinct metabolites (in all three compartments) and 2583 is the number of reactions including exchange and biomass reactions. Each of the reactions has an upper and lower bound on the flux it can carry. Reversible reactions have an upper bound of 1000 mmol gDW−1 h−1 and a lower bound of −1000 mmol gDW−1 h−1, making them practically unconstrained, while irreversible reactions have a lower bound of zero.
By default, the core biomass reaction is set as the objective to be maximized. Certain reactions are by default constrained to carry zero flux to avoid unrealistic behaviors. These reactions are CAT, DHPTDNR, DHPTDNRN, FHL (formate hydrogen lyase), SPODM, SPODMpp, SUCASPtpp, SUCFUMtpp, SUCMALtpp, and SUCTARTtpp. CAT, SPODM, and SPODMpp are hydrogen peroxide producing and consuming reactions that can carry flux in unrealistic energy generating loops. DHPTDNR and DHPTDNRN form a closed loop that can carry an arbitrarily high flux. The succinate antiporters SUCASPtpp, SUCFUMtpp, SUCMALtpp, and SUCTARTtpp can form unrealistic flux loops with other transporters for aspartate, fumarate, malate, and tartrate. The genes encoding FHL are known to be active under anaerobic conditions, but this reaction is constrained to zero to avoid unrealistic aerobic hydrogen production. The NGAM constraint is imposed by a lower bound of 3.15 mmol gDW−1 h−1 on the reaction ATPM. The exchange reactions that allow for extracellular metabolites to pass in and out of the system are defined such that a positive flux indicates flow out. All exchange reactions have a lower bound of zero except for glucose (−10 mmol gDW−1 h−1), the vitamin B12 precursor cob(I)alamin (−0.01 mmol gDW−1 h−1), and oxygen and all inorganic ions required by the biomass reaction (−1000 mmol gDW−1 h−1). The default lower bound on glucose uptake is based on typical glucose uptake rates. Because only a very small amount of B12 is required for growth, the lower bound on cob(I)alamin uptake is arbitrary and never actually constraining in practice. The iJO1366 computational model also includes drain reactions for six cytoplasmic metabolites without known consuming reactions that must be drained from the system to allow simulation of steady‐state cell growth. These metabolites are p‐cresol, 5′‐deoxyribose, aminoacetaldehyde, s‐adenosyl‐4‐methylthio‐2‐oxobutanoate, (2r,4s)‐2‐methyl‐2,3,3,4‐tetrahydroxytetrahydrofuran, and oxamate.
Prediction of all growth‐supporting carbon, nitrogen, phosphorus, and sulfur sources
The possible growth‐supporting carbon, nitrogen, phosphorus, and sulfur sources of E. coli were identified using FBA. First, all exchange reactions for extracellular metabolites containing the four elements were identified from the metabolite formulas. Every extracellular compound containing carbon was considered a potential carbon source, for example. Next, to determine possible growth‐supporting carbon sources, the lower bound of the glucose exchange reaction was constrained to zero. Then the lower bound of each carbon exchange reaction was set, one at a time, to −10 mmol gDW−1 h−1 (a typical uptake rate for growth‐supporting substrates), and growth was maximized by FBA using the core biomass reaction. The target substrate was considered growth supporting if the predicted growth rate was above zero. While identifying carbon sources, the default nitrogen, phosphorus, and sulfur sources were ammonium (nh4), inorganic phosphate (pi), and inorganic sulfate (so4), respectively. Prediction of growth‐supporting sources of these other three elements was performed in the same manner as growth on carbon, with glucose as the default carbon source.
Prediction of gene essentiality
To simulate the effects of gene knockouts, the iJO1366 model with its default constraints and core biomass reaction objective was modified to match the genotype of E. coli BW25113 (see Experimental phenotypic screens). For growth on glucose, the lower bound of the glucose exchange reaction was set to −10 mmol gDW−1 h−1. For growth on glycerol, the lower bound of the glucose exchange reaction was set to zero while the lower bound of the glycerol exchange reaction was set to −10 mmol gDW−1 h−1. All 1366 genes in the model were knocked out one a time and growth was simulated by FBA using the singleGeneDeletion COBRA Toolbox function. Gene knockout strains with a growth rate above zero were considered non‐essential. Experimental gene essentiality data for growth on glucose (Baba et al, 2006) and glycerol (Joyce et al, 2006) was then obtained and adjusted based on an updated analysis of the Keio Collection strains (Yamamoto et al, 2009). The newly identified essential genes were added to the lists of essential genes under both conditions, while the genes whose essentiality was identified as uncertain were not changed from their original designations.
Mapping iJO1366 to closely related strains
The protein sequences of all available E. coli and Shigella strains (38 strains total) were downloaded from KEGG (http://www.genome.jp/kegg/download), and SSEARCH35 of the FASTA suite (Pearson and Lipman, 1988), an open‐source implementation of the Smith–Waterman algorithm, was used to determine a PID conservation for each iJO1366 gene. The flags used in SSEARCH35 were ‘–m9 –E 1 –q –H’. Next, the deleteModelGenes COBRA Toolbox function was used to delete all genes in a strain that failed to be conserved at a particular protein sequence identity threshold. The entire cutoff range was scanned (from 0 to 100% identity in increments of 1%) and FBA was performed with an objective function of maximum growth rate using the core biomass reaction, to produce Figure 2A. Subsequently, the situation occurring at 40% identity was investigated to determine why four of the strains were unable to produce biomass at this conservation threshold. A demand reaction for each biomass component was added (excluding the components involved in the ATP hydrolysis reaction) and flux was separately maximized through each of these reactions to determine the biomass components that the strain was unable to produce. To determine the subset of genes responsible for a particular loss of function, the effects of single knockouts were computed and literature was consulted.
The preliminary network‐level analysis of conservation revealed that many ORFs are not properly automatically annotated in non‐model organism genomes. For this reason, this analysis was supplemented with a nucleotide‐level search for conservation of each model gene in each of the 38 strains. The analysis described above was repeated, this time with the additional conservation qualification of a genomic hit displaying at least 70% identity and aligned length to the model gene. These cutoffs were determined by manual inspection of a scatter plot of all nucleotide‐level results. There was a clear region of conservation in the upper right quadrant defined by these cutoffs. Genes found to be conserved only at the nucleotide level are considered to have 100% protein sequence identity (see Supplementary Table 14 for a complete list of such cases).
We thank Harish Nagarajan, Vasiliy Portnoy, Jennie Reed, and Ines Thiele for their helpful suggestions. This work was funded by NIH R01 GM057089.
Author contributions: JDO and BØP designed the study. JDO, JAL, and AMF performed the model updates. TMC and JN performed the phenotypic screening experiments. JAL performed the mapping to other strains. HN analyzed the knowledge index of model genes. All authors helped draft and edit the final manuscript.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Text and Figures [msb201165-sup-0001.pdf]
Supplementary Tables 1–14 [msb201165-sup-0002.xls]
Supplementary Model 1
iJO1366 model in SBML format [msb201165-sup-0003.zip]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2011 EMBO and Macmillan Publishers Limited