There is growing interest in elucidating the minimal number of genes needed for life. This challenge is important not just for fundamental but also practical considerations arising from the need to design microorganisms exquisitely tuned for particular applications. With a genome size of ~580 kb and approximately 480 protein coding regions Mycoplasma genitalium is one of the smallest known self-replicating organisms and, additionally, has extremely fastidious nutrient requirements. The reduced genomic content of M. genitalium has led researchers to suggest that the molecular assembly contained in this organism may be a close approximation to the minimal set of genes required for bacterial growth. We have introduced a systematic approach for the construction and curation of a genome-scale in silico metabolic model for M. genitalium. The model accounts for 189 of the 482 genes listed in the latest genome annotation and is named iPS189. We used computation tools during the process to bridge network gaps in the model (i.e., GapFind and GapFill) and restore consistency with experimental data that determined which gene deletions led to cell death using GrowMatch. We achieved 87% correct model predictions for essential genes and 89% for non-essential genes. We subsequently have used the metabolic model to determine components that must be part of the growth medium. The general aproaches we used during the construction of the M. genitalium model are being used to develop models of other organisms.
Over 700 genomes have been fully sequenced whereas only about 25 organism-specific genome-scale metabolic models have been constructed. The time required during the task of model-construction is becoming increasingly critical as genome sequencing for new organisms is proceeding at an ever-accelerating pace. The approaches and tools used during the M. genitalium model generation provide a roadmap for the automated metabolic reconstruction of other organisms. The application of the automated methodologies GapFind/GapFill and Growmatch can be effectively used during the construction process (as opposed to an a posteriori mode of deployment). Additionally, we have explored how GrowMatch is useful at improving model predictions when using growth phenotypes from various nutrient sources (i.e., carbon, nitrogen, phosphorus, sulfur) instead of single gene deletions.
Currently, there exists tens of different microbial and eukaryotic metabolic reconstructions (e.g.,Escherichia coli, Saccharomyces cerevisiae, Bacillus subtilis, Homo Sapiens) with many more under development. There are inaccuracies in these reconstructions due to the presence/absence of metabolic functionalities that the organism does not/ does actually possess. Our research is focused on developing optimization tools to pinpoint these inaccuracies and subsequently generate hypotheses to correct them. To this end, we have been involved in projects to develop the following tools:
Using GapFind , we identify the metabolites in the metabolic network reconstruction which cannot be produced under any uptake conditions. Subsequently, using GapFill, we identify the reactions from a customized multi-organism database that restores the connectivity of these metabolites to the parent network using four mechanisms. We demonstrate this procedure for the genome- scale reconstruction of Escherichia coli and also Saccharomyces cerevisiae wherein compartmentalization of intra-cellular reactions results in a more complex topology of the metabolic network. We determine that about 10% of metabolites in E. coli and 30% of metabolites in S. cerevisiae cannot carry any flux. Interestingly, the dominant flow restoration mechanism is directionality reversals of existing reactions in the respective models.
Genome-scale metabolic reconstructions are typically validated by comparing in silico growth predictions across different mutants utilizing different carbon sources with in vivo growth data. This comparison results in two types of model- prediction inconsistencies; either the model predicts growth when no growth is observed in the experiment (GNG inconsistencies) or the model predicts no growth when the experiment reveals growth (NGG inconsistencies). In this project, we developed an optimization-based framework, GrowMatch, to automatically reconcile GNG predictions (by suppressing functionalities in the model) and NGG predictions (by adding functionalities to the model). We use GrowMatch to resolve inconsistencies between the predictions of the latest in silico E. coli (iAF1260) model and the in vivo data available in the Keio collection and improved the consistency of in silico with in vivo predictions from 90.6% to 96.7%.
For many cases, GrowMatch identified fairly non-intuitive model modification hypotheses that would have been difficult to pinpoint through inspection alone. In addition, GrowMatch can be used during the construction phase of new, as opposed to existing, genome-scale metabolic models, leading to more expedient and accurate reconstructions.
In this research, we have introduced the Flux Coupling Finder (FCF) framework for elucidating the topological and flux connectivity features of genome-scale metabolic networks. The framework is demonstrated on genome-scale metabolic reconstructions of Helicobacter pylori, Escherichia coli, and Saccharomyces cerevisiae. The analysis allows one to determine if any two metabolic fluxes, v1 and v2, are (i) directionally coupled, if a non-zero flux for v1 implies a non-zero flux for v2 but not necessarily the reverse; (ii) partially coupled, if a non-zero flux for v1 implies a non-zero, though variable, flux for v2 and vice-versa; or (iii) fully coupled, if a non-zero flux for v1 implies not only a non-zero but also a fixed flux for v2 and vice-versa. Flux coupling analysis also enables the global identification of blocked reactions, which are all reactions incapable of carrying flux under a certain condition, equivalent knockouts, defined as the set of all possible reactions whose deletion forces the flux through a particular reaction to zero, and sets of affected reactions denoting all reactions whose fluxes are forced to zero if a particular reaction is deleted. The FCF approach thus provides a novel and versatile tool for aiding metabolic reconstructions and guiding genetic manipulations.
In analogy with the FCF method for fluxes we have developed an analogous method for metabolite pools. Specifically, conservation relationships for metabolite concentrations are important biophysical barriers, selected through evolution, to protect cellular organisms from stresses (e.g., osmotic) and provide global metabolic regulation.. The conservation relationships are linear combinations of metabolite concentrations that do not change over time and are solely determined by the organism's stoichiometry and uptake/secretion transport conditions. To this end, we have introduced an optimization-based framework to elucidate and analyze conservation relationships for metabolite concentrations in the context of genome-scale reaction networks. The framework is comprised of Metabolite Concentration Coupling Analysis (MCCA) and the Minimal Conserved Pool Identification (MCPI) procedure.
Synthetic lethals refer to pairs of non-essential genes whose simultaneous deletion prohibits growth. One can extend the concept of synthetic lethality by considering gene groups of increasing size where only the simultaneous elimination of all genes is lethal whereas individual gene deletions are not. We developed optimization-based procedures for the exhaustive and targeted enumeration of multi-gene (and by extension multi-reaction) lethals for genome-scale metabolic models. Specifically, we are applying these approaches to the iAF1260 model of E. coli. This analysis for the iAF1260 model led to the complete identification of all double and triple synthetic lethals as well as the targeted identification of higher-order synthetic lethals. Graph representations of these synthetic lethals reveal a variety of motifs ranging from hub-like to highly connected sub-graphs providing a birds-eye view of the avenues available for redirecting metabolism and uncovering complex patterns of gene utilization and interdependence. The procedure also enables the use of falsely predicted synthetic lethals for metabolic model curation. By analyzing the functional classifications of the genes involved in synthetic lethals we reveal surprising connections within and across COG functional classifications.
Clostridia are anaerobic, gram positive bacteria with the ability to consume a broad range of substrates starting from cellulose to glycerol and produce a spectrum of useful metabolites such as propanediol, butanol. The industrial application of these non-standard organisms is hindered by the lack of understanding of their metabolic repertoire which can resolved by the aid of stoichiometric and kinetic models of metabolism. The major challenges in such modeling are redox balance systems (transhydrogenase, hydrogenase etc.), cellulosome complex (whose composition varies based on substrate utilized), biphasic growth (shift between growth phases based on media conditions) and the variation in TCA cycle and PP pathway across organism. We have previously constructed and used a stoichiometric model for understanding C. acetobutylicum's response to stressors (Dash et al, 2014). We are currently working on C. thermocellum stoichiometric and kinetic models to identify ethanol over-producing genetic perturbations.
A key consideration in metabolic engineering is the determination of fluxes of the metabolites within the cell. The gold standard in elucidating metabolic flows is the use of metabolic flux analysis (MFA). Flux elucidation requires both a detailed mapping of all possible paths different isotope forms can take through metabolic networks and experimental data in the form of labeled isotopes (usually 13C) sampled by NMR or GC-MS. When cells are fed a growth substrate with certain carbon positions labeled with 13C, the distribution of this label in the intracellular metabolites can be calculated based on the known biochemistry of the participating pathways. Most labeling studies have focused on skeletal representations of central metabolism that ignore many flux routes that could contribute to the observed isotopic labeling patterns. In contrast, our approach investigates the importance of carrying out isotopic labeling studies using a more complete reaction network. Our group has contributed and made available a large-scale isotope mapping model for E. coli spanning 238 reactions and 184 metabolites yielding 17,346 distinct isotopes by manually tracking the fate of labeled carbons. This network includes glycolysis, the TCA cycle, the pentose phosphate pathway, anaplerotic reactions, amino acid biosynthesis/ degradation, and oxidative phosphorylation. Many of these reactions are reversible and the model also includes a reaction to account for biomass production. Furthermore, the model includes global metabolite balances on cofactors such as ATP, NADH, and NADPH.
The experimental system was a strain of E. coli that produces amorphadiene, a precursor to the antimalarial drug artemisinin, grown on aerobic glucose conditions. We found that hundreds of distinct flux distributions were identified with sum of squared error (SSE) values that met the statistical cutoff for goodness-of-fit (i.e., chi-square). This lack of unique elucidation prompted us to deduce flux ranges spanned by equivalent solutions given the experimental error in the measurements. First, we calculated the variability in all fluxes subject to only network stoichiometry. As expected, nearly all of the central metabolic fluxes span wide ranges of values, motivating the need for 13C-based MFA. We subsequently recalculated flux variability after including the isotopomer balancing constraints to determine (i) which fluxes were fully resolved; (ii) how many of them were only partially resolved; and (iii) how many remaining fluxes in the model were completely unaffected by the isotopomer data.
One of the important considerations for analysis of the isotope model is how isotope measurements impact the elucidation of fluxes in large-scale metabolic reconstructions. This identifiability problem in MFA with isotopic considerations is very difficult, as isotopic balances yield nonlinear constraints. We have developed an integer programming framework, OptMeas, for the mathematical analysis of metabolic pathways to answer this question. By using a structural analysis-based optimization method, it is possible to exhaustively identify all combinations of isotope labeling experiments and flux measurement that completely resolve all flux values in the network. This approach results in an integer linear programming formulation while accounting for the case of partial measurements (e.g., when only some fragments are measured). These measurements, consisting of both partial or full isotope state determination, were assigned relative costs that allow the experimentalist to select the measurements that will be both sufficient and economical. The proposed framework has been tested on well-studied small demonstration examples. We have also benchmarked the proposed framework by applying it to medium-scale metabolic networks of E. coli and by revisiting our large-scale E. coli model to exhaustively identify all measurements options.
Our group introduced the computational framework termed OptKnock for
suggesting gene deletion strategies that are likely to lead to the
overproduction of specific chemical products. Specifically, a nested
optimization framework (see Figure) was proposed to identify multiple
gene deletion combinations that force the coupling of a cellular
objective, here assumed to be a drain of biosynthetic precursors in the
ratios required for biomass formation, with imposed chemical production
targets. This coupling is accomplished by ensuring that the production
of the desired product becomes an obligatory byproduct of growth by
"shaping" the connectivity of the metabolic network. The computational
procedure was designed to identify not just straightforward but also
non-intuitive knockout strategies by considering the entire genome-scale
metabolic networks as abstracted in recently developed in silico
Most computational strain design methods rely on surrogate biological objectives (e.g., maximize growth rate or minimize metabolic adjustments) and do not make use of flux measurements often available for the wild-type strain. We have also developed the OptForce framework that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target. With increasing availability of fluxomics data and improved strain engineering and molecular biology techniques, OptForce predictions can be implemented for increasing various products including malonyl-coA (Xu et al, 2011) and fatty acids (Ranganathan et al, 2012).
Computational strain-design prediction accuracy has been the focus for many recent efforts through the selective integration of kinetic information into metabolic models. Existing approaches relying solely on stoichiometry and rudimentary constraint-based regulation overlook the effects of metabolite concentrations and substrate-level enzyme regulation while identifying metabolic interventions. The k-OptForce protocol integrates the available kinetic descriptions of metabolic steps into a genome-scale stoichiometric-based modeling formalism to sharpen the prediction of intervention strategies. k-OptForce uses kinetic information to (re)apportion reaction fluxes in the network by identifying interventions comprised of both direct enzymatic parameter changes (for reactions with available kinetics) and reaction flux changes (for reactions with only stoichiometric information). The introduction of kinetic expressions can significantly alter the identified interventions compared to those identified with stoichiometry-alone analysis. In particular, additional modifications are required in some cases to avoid the violation of metabolite concentration bounds, while in other cases, the kinetic constraints yield metabolic flux distributions that favor the overproduction of the desired product thereby requiring fewer direct interventions.
We have also developed the OptStrain framework which aims to guide pathways modifications, through reaction additions and deletions, of microbial networks for the overproduction of targeted compounds. These compounds may range from electrons or hydrogen in bio-fuel cell and environmental applications to complex drug precursor molecules. A comprehensive database of biotransformations, referred to as the Universal database (with over 5,000 reactions), is compiled and regularly updated by downloading and curating reactions from multiple biopathway database sources. Combinatorial optimization is then employed to elucidate the set(s) of non-native functionalities, extracted from this Universal database, to add to the examined production host for enabling the desired product formation. Subsequently, competing functionalities that divert flux away from the targeted product are identified and removed to ensure higher product yields coupled with growth. This work represents an advancement over earlier efforts by establishing an integrated computational framework capable of constructing stoichiometrically balanced pathways, imposing maximum product yield requirements, pinpointing the optimal substrate(s), and evaluating different microbial hosts.
Moving from stoichiometric to kinetic models of metabolism, in this research, we proposed a general computational procedure to determine which genes/enzymes should be eliminated, repressed or overexpressed to maximize the flux through a product of interest for general kinetic models. The procedure relies on the generalized linearization of a kinetic description of the investigated metabolic system and the iterative application of mixed-integer linear (MILP) optimization implemented using the MATLAB environment. The proposed computational procedure is a general approach that can be applied to any metabolic system for which a kinetic description is provided. All MATLAB input files are made available as supplementary material.
In contrast to stoichiometric-based models, the development of large-scale kinetic models of metabolism has been hindered by the challenge of identifying kinetic parameter values and kinetic rate laws applicable to a wide range of environmental and/or genetic perturbations. The recently introduced ensemble modeling (EM) procedure provides a promising remedy to address these challenges by decomposing metabolic reactions into elementary reaction steps and incorporating all phenotypic observations, upon perturbation, in its model parameterization scheme. Here, we present a stepwise procedure to develop kinetic models of metabolism that satisfies the fluxomic data for many mutant strains by making use of the EM concepts. Parameterization is performed using a formal optimization approach that minimizes the discrepancies between model predictions and flux measurements. With an ever increasing advances in omics measurements, the constructed models and the parameterization procedure presented in this study pave the way for the construction of larger-scale kinetic models with more narrowly distributed parameter values.