Integrated Modeling Framework

This document outlines the mathematical framework for the dynamic simulation of engineered pathways for wax ester and melatonin co-production in Bacillus subtilis . The framework integrates two distinct models: The wax ester model is a comprehensive simulation based on literature-validated enzyme kinetics, realistic metabolite concentrations, and key physiological processes like enzyme expression lag, metabolic burden, and cofactor regeneration. In parallel, the melatonin model focuses on predictive analysis of its core pathway and key enzymes. Additionally, supporting models for plasmid copy number and enzyme thermal stability provide a robust foundation for our kinetic parameters and biological assumptions.

Dynamic Model of Wax Ester Biosynthesis

Pathway and State Variables

Pathway Reactions

Prior to modeling, we conducted extensive literature reviews and in-depth research (some key recent papers are followings) on the metabolism of Bacillus subtilis.

The engineered pathway consists of two primary enzymatic steps:

  1. Fatty Acid Reductase (FAR):

  2. Wax Synthase (WS):

State Variables

The system is described by the following state variables, with concentrations in millimolar (mM):

  • : Concentration of Palmitoyl-CoA
  • : Concentration of 1-Hexadecanol
  • : Concentration of Wax Ester (Hexadecyl Palmitate)
  • : Concentration of NADPH
  • : Concentration of NADP
  • : Concentration of free Coenzyme A
  • : Concentration of ATP

Enzyme and Cellular Dynamics

Enzyme Expression Dynamics

To simulate a realistic induction scenario, enzyme expression follows a sigmoidal curve with an initial lag phase. The expression factor, , modulates the maximum enzyme concentration over time. And this simulation also based on the related papers

where , is the expression lag time, is the time to reach maximum expression, and is the steepness of the induction curve.

The active enzyme concentrations are then given by:

Kinetic Rate Laws

A general metabolic burden factor, , is applied to both enzymatic reactions, representing the reduction in cellular efficiency as the product accumulates.

where is a burden coefficient and is a reference concentration.

  • Enzyme E1: Fatty Acid Reductase (FAR)

    P.S. FAR enzyme here is a protein with two active sites. Aldehyde produced in the first reduction step immediately enters the second catalytic site where it is further reduced to an alcohol. Since the aldehyde intermediate is not released during the reaction, it is omitted in modeling calculations.

    The FAR reaction rate is modeled using Michaelis-Menten kinetics with terms for cofactor dependence, competitive product inhibition by 1-Hexadecanol, and non-competitive feedback inhibition by the final Wax Ester product. And for this modeling, we were inspired by some coressponding papers .

    The rate equation, , is constructed as follows:

    The final rate law for FAR is:

  • Enzyme E3: Wax Synthase (WS)

    The WS reaction follows a Ping-Pong Bi-Bi mechanism. The model incorporates competitive inhibition by free CoA against Palmitoyl-CoA and non-competitive product inhibition by the Wax Ester.

    The inhibition terms are:

    The full rate law for WS is:

The effective, burden-adjusted rates used in the final ODEs are and .

Cellular and Cofactor Dynamics

  • Cofactor Regeneration: Regarding B. Subtilis cofactor metabolism, we analyzed several papers.

    • NADPH Regeneration: Modeled as a process dependent on the available pool and limited by the maximum cellular NADPH capacity, with a stress factor accounting for high demand.
    • CoA Regeneration: Modeled as a first-order process to restore the total CoA pool.
  • Substrate Supply and Product Sequestration:

    • Palmitoyl-CoA Supply: Maintained around a target concentration via a relaxation model.
    • ATP Maintenance: The cellular ATP pool is maintained at a target level, with efficiency decreasing under high metabolic flux.
    • Wax Ester Sequestration: Removal of soluble wax ester into lipid droplets is modeled with Michaelis-Menten kinetics.

System of Equations and Parameters

System of Ordinary Differential Equations (ODEs)

The complete dynamics of the system are described by the following set of coupled ordinary differential equations:


Model Parameters

ParameterValueUnitDescription
50.0MMax concentration of FAR
50.0MMax concentration of WS
0.26FAR turnover number
11.2MFAR Michaelis constant for P-CoA
25.0MFAR Michaelis constant for NADPH
80.0MFAR competitive inhibition by Hexadecanol
120.0MFAR non-competitive feedback inhibition by WE
17.1WS turnover number
15.4MWS Michaelis constant for P-CoA
150.0MWS Michaelis constant for Hexadecanol
300.0MWS competitive inhibition by CoA
150.0MWS non-competitive inhibition by WE
0.2mMTotal cellular NADPH capacity
0.55mMTotal cellular CoA pool
0.05mMTarget Palmitoyl-CoA concentration
5.0mMTarget ATP concentration
0.8Max NADPH regeneration rate
0.2CoA regeneration rate constant
0.15P-CoA supply rate constant
1.0ATP maintenance rate constant
0.01Max wax sequestration rate
10.0MSequestration half-saturation constant

Simulation Results and Analysis

The following simulation results demonstrate the dynamic behavior of the engineered wax ester biosynthesis pathway under realistic cellular conditions. These visualizations illustrate key aspects of pathway performance, metabolic efficiency, and cellular constraints.

Pathway Dynamics and Substrate-Product Relationships

B. subtilis Wax Ester Biosynthesis - Engineered Pathway Dynamics
B. subtilis Wax Ester Biosynthesis - Engineered Pathway Dynamics

This simulation shows the temporal evolution of key metabolites throughout the wax ester biosynthesis pathway. The graph illustrates the consumption of Palmitoyl-CoA substrate (blue line), the transient accumulation of 1-Hexadecanol intermediate (orange line), and the progressive formation of the final wax ester product (red line). The background color phases represent distinct cellular states: the lag phase (pink) during initial protein expression, the growth phase (yellow) with increasing enzymatic activity, and the production phase (green) with steady-state wax ester synthesis. This visualization demonstrates the characteristic behavior of the two-step enzymatic pathway, where substrate depletion drives intermediate formation, which subsequently converts to the desired wax ester product.

Carbon Utilization Efficiency

Carbon Conversion Efficiency Over Time
Carbon Conversion Efficiency Over Time

This plot tracks the carbon conversion efficiency throughout the fermentation process, showing how effectively the engineered pathway converts substrate carbon into the target wax ester product. The efficiency curve demonstrates an initial baseline efficiency during the lag phase, followed by a rapid increase during the growth phase as enzyme expression reaches optimal levels. The vertical reference lines mark critical time points: expression start (red dashed line) and full expression achievement (orange dashed line). The steady-state efficiency of approximately 85% in the production phase indicates highly effective carbon utilization, representing successful metabolic engineering of the pathway. This high efficiency is crucial for commercial viability and demonstrates the pathway's robust performance under cellular constraints.

Cofactor Balance and Metabolic State

Redox Balance - NADPH/NADP+ Ratio
Redox Balance - NADPH/NADP+ Ratio

This graph monitors the cellular redox state by tracking the NADPH/NADP+ ratio, a critical parameter for fatty acid reduction reactions. The simulation shows the ratio maintaining values between 8-10, which falls within the optimal range (>10 = optimal, >5 = acceptable, <2 = critical) for efficient FAR enzyme function. The slight decline from ~9.8 to ~8.5 over time reflects the increasing NADPH demand as wax ester production scales up, while the cellular NADPH regeneration systems work to maintain cofactor availability. This stable redox balance indicates that the engineered pathway does not severely disrupt cellular metabolism and that cofactor regeneration mechanisms are adequate to support sustained production.

Cumulative Mass Balance

Engineered Pathway Performance - Mass Balance Analysis
Engineered Pathway Performance - Mass Balance Analysis

Multi-Variable Pathway Optimization: 3D Surface Predictions for Wax Biosynthesis

Supply vs Kcat Surface
Supply vs Kcat
Wax Production WS-NADPH Surface
Wax Production: WS vs NADPH
Wax Production 3D Surface
Wax Production 3D Surface
  • Figure 5 (Supply vs Kcat): The triangular/pyramid surface correctly shows multiplicative effects with peak production when both supply and catalytic efficiency are optimized.
  • Figure 6 (WS vs NADPH): The saddle-shaped surface demonstrates the cofactor dependency I described.
  • Figure 7 (Wax Production 3D Surface): Shows the integrated multi-variable optimization landscape.

The surface topologies, color gradients (blue=low production, yellow=high production), and axis relationships all support our original interpretations. The descriptions accurately capture the key biochemical insights these 3D models reveal about pathway optimization and variable interdependencies.

This comprehensive mass balance plot demonstrates the overall pathway performance by tracking the cumulative consumption of substrate and formation of products. The graph shows steady substrate consumption (red dashed line) reaching approximately 27 μM, with nearly complete conversion to wax ester product (green line) achieving ~128 μM final concentration. The minimal accumulation of the hexadecanol intermediate (orange line) indicates efficient flux through both enzymatic steps without significant bottlenecks. The background color phases again represent the three distinct operational periods, with the production phase showing sustained, linear product formation. This mass balance confirms the pathway's high conversion efficiency and validates the mathematical model's predictions for substrate utilization and product yield.

Predictive Model of Melatonin Biosynthesis

This section outlines the mathematical framework for the dynamic simulation of the engineered melatonin biosynthesis pathway in Bacillus subtilis. The model focuses on the predictive analysis of the core pathway, branching routes, and the kinetics of key enzymes. It is built upon a system of ordinary differential equations (ODEs) describing the concentrations of all relevant metabolites and cofactors.Here are some papers on melatonin producion of the Bacillus subtilis .

Pathway Design and Reactions

Our engineered melatonin pathway initiates from L-tryptophan, which is sequentially converted to 5-hydroxytryptophan (5-HTP) and then to serotonin via the enzymes TPH and TDC, respectively. At this point, the pathway bifurcates. In the canonical route, serotonin is acetylated by SNAT to N-acetylserotonin, followed by methylation by COMT to yield melatonin. An alternative route involves the initial methylation of serotonin by COMT to produce 5-methoxytryptamine, which is subsequently converted to melatonin through acetylation by SNAT.

Common Upstream Pathway

  1. Tryptophan Hydroxylase (TPH):

  2. Aromatic-L-amino-acid decarboxylase (TDC):

Primary Route to Melatonin

  1. Serotonin N-acetyltransferase (SNAT):

  2. Catechol-O-methyltransferase (COMT):

Alternative Route to Melatonin

  1. Catechol-O-methyltransferase (COMT):

  2. Serotonin N-acetyltransferase (SNAT):

Model Formulation

State Variables

The model uses a system of 11 ordinary differential equations (ODEs) representing mass balance. The state variables represent the concentration of each molecular species:

  • : L-Tryptophan (substrate)
  • : 5-Hydroxytryptophan (intermediate)
  • : Serotonin (branching point)
  • : N-acetylserotonin (primary route intermediate)
  • : 5-Methoxytryptamine (alternative route intermediate)
  • : Melatonin produced via primary route
  • : Melatonin produced via alternative route
  • : S-adenosyl methionine (methylation cofactor)
  • : S-adenosyl homocysteine (inhibitor product)
  • : Acetyl-CoA (acetylation cofactor)
  • : Free Coenzyme A (product)

E. coli Growth Phase Model

The model incorporates time-dependent enzyme expression based on typical E. coli growth phases (Lag, Exponential, and Stationary). Assuming constitutive promoters, a single growth factor, , modulates the concentration of all pathway enzymes simultaneously.

The expression factor is modeled as a piecewise function of time ( in hours):

The active concentration for any enzyme at time is given by:

Kinetic Rate Laws

The reaction velocities are modeled using Michaelis-Menten kinetics tailored to the specific mechanism of each enzyme, incorporating relevant product and cofactor inhibition terms.

  • 1. Tryptophan Hydroxylase (TPH)

    TPH follows Michaelis-Menten kinetics with competitive product inhibition by 5-HTP. The inhibition term is defined as:

    The reaction rate is:

  • 2. Aromatic-L-amino-acid decarboxylase (TDC)

    TDC is modeled similarly, with product inhibition by Serotonin.

    The reaction rate is:

  • 3. Serotonin N-Acetyltransferase (SNAT) - Primary Route

    SNAT follows an Ordered Ternary Complex (Bi-Bi) mechanism. Free CoA acts as a competitive inhibitor against Acetyl-CoA.

    The rate for the acetylation of Serotonin is:

  • 4. Catechol-O-Methyltransferase (COMT) - Primary Route

    COMT also follows a Bi-Bi mechanism. The product SAH acts as a competitive inhibitor against the cofactor SAM.

    The rate for the methylation of N-acetylserotonin (NAS) is:

  • 5. COMT - Alternative Route

    In the alternative route, COMT methylates Serotonin. This is modeled using the same mechanism and inhibition term () but with Serotonin-specific parameters. Serotonin is characterized by a much higher compared to NAS.

    The rate is:

  • 6. SNAT - Alternative Route

    In the alternative route, SNAT acetylates 5-Methoxytryptamine (5-MT). This uses the same mechanism and inhibition term () as the primary route.

    The rate is:

Cofactor Cycles and System Equations

SAM/SAH Methylation Cycle

The methylation cycle regenerates SAM and removes SAH, maintaining methylation capacity for COMT reactions.

The SAM regeneration rate is:

The SAH removal rate is:

Acetyl-CoA / CoA Cycle

Acetyl-CoA supply and CoA recycling sustain acetylation capacity for SNAT reactions.

The Acetyl-CoA supply rate is:

The CoA recycling rate is:

System of Ordinary Differential Equations (ODEs)

The dynamics of substrates, intermediates, and cofactors are described by the following set of coupled ODEs:

Simulation and Analysis

Pathway Competition Analysis

  • Kinetic Preferences: The model predicts pathway preference based on enzyme kinetics:

    • Primary Route Advantage: SNAT has high affinity for serotonin () and COMT efficiently methylates N-acetylserotonin ().
    • Alternative Route Limitation: COMT has poor affinity for serotonin () and competes poorly at low serotonin concentrations.
  • Regulatory Mechanisms:

    • Product Inhibition: 5-HTP inhibits TPH, Serotonin inhibits TDC, SAH inhibits both COMT reactions, CoA inhibits both SNAT reactions.
    • Cofactor Competition: SAM depletion affects both methylation steps, Acetyl-CoA availability limits both acetylation steps.

Simulation Parameters

  • Literature-Validated Kinetics: Representative kinetic parameters are derived from literature and user inputs:

    • ,
    • ,
  • Initial Conditions (mM): The initial metabolite concentrations are:

MetaboliteInitial (mM)
L-Tryptophan0.05
5-HTP0.001
Serotonin0.005
N-acetylserotonin0.0
5-Methoxytryptamine0.0
Melatonin (primary)0.0
Melatonin (alternative)0.0
SAM0.25
SAH0.05
Acetyl-CoA0.12
CoA (free)0.30

Model Outputs

  • Pathway Overview — Complete dynamics of all intermediates
Complete dynamics of all intermediates
Complete dynamics of all intermediates

This graph shows the complete dynamics of all intermediates in the melatonin biosynthesis pathway:

  • L-Tryptophan (brown): Starting substrate, shows rapid depletion from ~50μM to ~13μM
  • 5-HTP (red): First intermediate, peaks early then decreases
  • Serotonin (blue): Key branch point, reaches steady state around 12μM
  • N-acetylserotonin (green dashed): Primary pathway intermediate
  • 5-Methoxytryptamine (pink dotted): Alternative pathway intermediate, remains near zero

The three colored phases (lag, exponential, stationary) show the temporal progression of the entire pathway.

  • Competition Analysis — Primary vs alternative route comparison
Primary vs alternative route comparison
Primary vs alternative route comparison

This graph demonstrates route comparison:

  • Primary Route (red): SNAT→COMT pathway, produces most melatonin (~310mM)
  • Alternative Route (purple): COMT→SNAT pathway, contributes minimally (~10mM)
  • Total Melatonin (dark red): Sum of both routes
  • Shows the primary route dominates melatonin production almost entirely
  • Route Preference — Percentage contribution over time
Percentage contribution over time
Percentage contribution over time

This graph quantifies pathway preference over time:

  • Primary Route: Starts at ~65%, rapidly increases to ~97% and maintains dominance
  • Alternative Route: Starts at ~33%, quickly drops to ~3%
  • The 50% threshold line shows when pathway preference switches
  • Demonstrates the system's strong bias toward the primary pathway after initial transients
  • Cofactor Dynamics — SAM/SAH and Acetyl-CoA/CoA cycles
SAM/SAH and Acetyl-CoA/CoA cycles
SAM/SAH and Acetyl-CoA/CoA cycles

This shows cofactor cycle dynamics:

  • SAM/SAH cycle (top): SAM (green) maintains 290μM, SAH (red) stays low (5μM) - healthy methylation capacity
  • Acetyl-CoA/CoA cycle (bottom): Acetyl-CoA (blue) ~140μM, Free CoA (orange) ~400μM - sufficient acetylation capacity
  • Both cycles maintain favorable ratios throughout the simulation, ensuring sustained pathway flux

Quantitative Analysis and Biological Insights

  • Quantitative Analysis:

    • Production Metrics: Final melatonin concentrations, Pathway contribution percentages, Conversion efficiency.
    • Kinetic Analysis: Maximum production rate, Peak production timing, Growth phase correlation.
    • Bottleneck Identification: Intermediate accumulation patterns, Rate-limiting step analysis, Cofactor limitation assessment.
  • Biological Insights:

    • E. coli-Specific Considerations: No circadian rhythm — expression follows growth phases, Constitutive expression — enzyme levels scale with growth, Cofactor pools reflect Bacillus subtilis metabolism.
    • Pathway Engineering Implications: Balance TPH/TDC vs SNAT/COMT enzyme expression ratios, SAM and Acetyl-CoA availability is crucial, Feedback regulation must be considered.
  • Model Validation:

    • Kinetic parameters match literature values.
    • Pathway competition reflects known biochemistry.
    • Cofactor cycles are mechanistically complete.
    • Growth-dependent enzyme expression reflects Bacillus subtilis behavior.

Predictive Modeling of Metabolic Pathway Efficiency Through 3D Parameter Space Analysis

SAM Regeneration vs COMT Surface
SAM Regeneration vs COMT
SNAT vs COMT Surface
SNAT vs COMT
TPH vs TDC Surface
TPH vs TDC
  • Figure 12 (SAM Regeneration vs COMT): Shows the expected curved surface where high SAM regeneration rates and optimal COMT levels produce maximum melatonin (yellow regions), with production dropping off at extreme values (blue regions).
  • Figure 13 (SNAT vs COMT): Displays a plateau-like surface indicating balanced enzyme ratios are needed for optimal production.
  • Figure 14 (TPH vs TDC): Shows the upstream pathway competition with similar curved topology.

The geometric patterns of these surfaces, combined with the color mapping scheme (blue indicating minimal output, yellow representing peak production) and variable correlations along each axis, validate my initial analysis. My characterizations effectively capture the fundamental biochemical principles these 3D visualizations demonstrate regarding metabolic pathway enhancement and parameter interdependence relationships.

Supporting Models for System Characterization

Plasmid Copy Number Prediction

Plasmid and Expression System

The model is based on an expression system utilizing the pBluescript II SK(+) (pBSSK2(+)) plasmid. This is a high-copy-number vector, typically maintained at 300-500 copies per cell. According to some papers . The selection of a high-copy plasmid is a strategic choice to ensure a high gene dosage, leading to robust expression of the heterologous enzymes. This biological premise directly justifies the high maximum enzyme concentrations ( and ) assumed in the model, which are essential for driving significant metabolic flux towards the final product.

Model A: Total Plasmid Mass Conservation

To predict how the insertion of our synthetic pathways would affect plasmid copy number, we first implemented a foundational model based on the principle of Total Plasmid Mass Conservation. This model provides a robust first-order approximation for estimating copy numbers in a given host system.

  • 1. Core Principle

    Model A is built on the hypothesis that a host cell (such as E. coli) has a finite capacity to support foreign DNA. This capacity is limited by the metabolic burden associated with replicating plasmids, including the consumption of nucleotides, energy (ATP), and the cellular replication machinery. Consequently, the cell maintains a relatively constant total mass of plasmid DNA. This implies an inverse relationship between a plasmid's size and its copy number: if the size of the plasmid increases, its copy number must decrease proportionally to keep the total DNA mass constant.

  • 2. Mathematical Formulation

    The relationship is described by the following simple and elegant formula:

    Where:

    • is the predicted copy number of the new plasmid.
    • is the size (in base pairs) of the new plasmid.
    • is the size of a known reference plasmid.
    • is the known copy number of the reference plasmid.

    For our simulation, we calibrated the model using pBluescript II SK(+) as our reference. It has a size of 2,961 bp and a well-documented high copy number of approximately 400 copies. This establishes a constant total plasmid mass of 1,184,400 bp that this particular host system can maintain.

  • 3. Analysis of Simulation Results

    The provided four-panel figure visualizes the predictions and validations of Model A:

    • Copy Number vs Plasmid Size: This plot clearly illustrates the inverse relationship between plasmid size and copy number. As the plasmid size increases, the predicted copy number follows a perfect hyperbolic decay curve. This is the fundamental prediction of the model, showing how drastically the copy number can fall as larger genetic circuits are introduced.
    Copy numbers and the size
    Copy numbers and the size
    • Predicted Copy Numbers by Plasmid: This bar chart provides a practical comparison for several common lab plasmids and hypothetical constructs. It effectively demonstrates, for instance, that while a small plasmid like pUC19 (2,686 bp) can maintain a high copy number (~441), our large construct (13,500 bp) is predicted to have a much lower copy number of approximately 88. This has significant implications for protein expression levels.
    Plasmid comparison
    Plasmid comparison
    • Mass Conservation Verification: This plot is a direct and powerful validation of the model's core assumption. It shows that the calculated total plasmid mass (Size × Copy Number) for every tested plasmid is constant, lying exactly on the reference mass line of 1,184,400 bp. This confirms the model's internal consistency and mathematical correctness.
    Mass conservation
    Mass conservation
    • Copy Number vs Size Factor: This plot recasts the inverse relationship into a linear one by plotting the predicted copy number against a "Size Factor" (Reference Size / Plasmid Size). The perfect linear correlation demonstrates that the copy number is directly proportional to the inverse of the plasmid size, providing an intuitive and mathematically elegant confirmation of the model's foundation.
    Size factor analysis
    Size factor analysis
  • 4. Plasmid Copy Number Integration

Our plasmid copy number prediction model (described in the Supporting Models section) provides crucial validation for the enzyme concentration parameters used in this wax ester biosynthesis model. Using the Total Plasmid Mass Conservation principle, we predicted that our engineered construct (13,500 bp) containing the FAR and WS genes will maintain approximately 88 copies per cell in Bacillus subtilis. And here are predictions of our model, you can see the following 4 images.

Time Course Analysis
Time Course Analysis
Plasmid Comparison
Plasmid Comparison
Efficiency Analysis
Efficiency Analysis
Optimization Results
Optimization Results

This predicted copy number of 88 falls within the high expression interval (50-100 copies), which directly supports our choice of maximum enzyme concentrations. The relationship between plasmid copy number and enzyme expression can be quantified as:

where:

  • copies per cell (from plasmid model)
  • represents the protein production per plasmid copy
  • and are enzyme-specific expression efficiency factors

With 88 copies providing high gene dosage, our model parameters of and are well-justified and represent achievable expression levels. This plasmid-informed parameterization ensures our kinetic model predictions are grounded in realistic molecular constraints.

Impact on Wax Ester Production: The high copy number (88) enables sufficient enzyme expression to drive meaningful metabolic flux toward wax ester production, validating our pathway design for efficient biotechnological production in B. subtilis.

  • 4. Conclusion

    In summary, Model A (Total Plasmid Mass Conservation) serves as an essential baseline for our project. It provides a simple, robust, and validated method for estimating how plasmid copy number will change as we alter its size. These predictions are critical for anticipating gene dosage effects and managing metabolic burden in our engineered system.

Thermal Stability and Enzyme Performance (MD Simulation)

We evaluated thermal effects on enzyme performance with six short (17 ns) GROMACS MD runs (trajectory videos rendered in VMD) across four enzymes. Two enzymes (FAR, WS/DGAT) were simulated at 25 °C and 36.85 °C, enabling a temperature-dependent mapping from MD observables to kinetic parameters in the ODE model.

What’s here
  • How temperature enters the ODEs (Arrhenius / Q10)
  • How MD metrics map to parameters (kcat, Km, inhibition factors)
  • Compact per-simulation dashboards (RMSD, Rg, RMSF, SASA, H-bonds, Temp, Pressure, Energy Δ)
  • Trajectory previews

How Temperature Affects the Kinetic Model

We expose temperature T (Kelvin) to the existing rate laws via either Arrhenius or Q10 forms (choose one globally or per enzyme):

  • Arrhenius (recommended when Ea is known):

    where is a reference temperature (e.g., 298.15 K), is activation energy, an empirical sensitivity for , and the gas constant.

  • Q10 (robust with limited data):

We then use these temperature-aware parameters inside the existing and rate laws (everything else in the ODEs stays the same).

  • Stability-Aware Efficiency Modifier (from MD) Short MD runs provide fast proxies for structural stability. We derive a mild, unitless efficiency factor to nudge (and optionally inhibition terms), based on relative shifts versus the 25 °C baseline:

    • and : global deformation & compactness
    • H-bonds and SASA: core packing & exposure
    • : flexibility (active-site window, if annotated)

    One practical linear form:

    (clamped to ; set small, e.g. ranges). Final: . When catalytic residues are annotated later, global metrics can be replaced with active-site-region RMSF for sharper sensitivity.

MD length
17 ns per run (videos match)
Thermostat
V-rescale (equil.), PR barostat (prod.)
Water / FF
TIP3P / AMBER99SB-ILDN
Output set
RMSD, Rg, RMSF, SASA, H-bonds, T, P, Energy Δ
Note: Short simulations are for comparative stability trends, not absolute folding thermodynamics.

Dashboards by Simulation

Figures are auto-generated from the analysis pipeline following a standardized config (pipeline detail could be found at the end of this section).

AcrB — 25 °C vs 36.85 °C
AcrB-25 25 °C (298.15 K)
AcrB-25 RMSD
RMSD (backbone)
AcrB-25 Rg
Radius of gyration
AcrB-25 RMSF
RMSF per residue
AcrB-25 SASA
SASA (total)
AcrB-25 Hbonds
Intra-protein H-bonds
AcrB-25 Temp
Temperature trace
AcrB-25 Pressure
Pressure trace
AcrB-25 Energy Δ
Energy fluctuations (Δ)
AcrB-3685 36.85 °C (≈310 K)
AcrB-36.85 RMSD
RMSD (backbone)
AcrB-36.85 Rg
Radius of gyration
AcrB-36.85 RMSF
RMSF per residue
AcrB-36.85 SASA
SASA (total)
AcrB-36.85 Hbonds
Intra-protein H-bonds
AcrB-36.85 Temp
Temperature trace
ArcB-36.85 Pressure
Pressure trace
AcrB-36.85 Energy Δ
Energy fluctuations (Δ)
WS/DGAT — 25 °C vs 36.85 °C
wsdgat25 25 °C
RMSD
Rg
RMSF
SASA
H-bonds
Temp
Pressure
Energy Δ
wsdgat3685 36.85 °C
RMSD
Rg
RMSF
SASA
H-bonds
Temp
Pressure
Energy Δ
TDC
RMSD
Rg
RMSF
SASA
H-bonds
Temp
Pressure
Energy Δ
TPH
RMSD
Rg
RMSF
SASA
H-bonds
Temp
Pressure
Energy Δ

Simulation Results Summary

Across all enzyme simulations (FAR, WS/DGAT, TDC, TPH), the structural metrics consistently indicate stable and well-maintained protein folds throughout the 17 ns production runs:

  • Structural Stability — RMSD traces show rapid equilibration followed by stable plateaus, indicating all enzymes maintained their tertiary structures. Minor fluctuations observed in some runs remain numerically stable and are consistent with normal thermal motion.
  • Compactness — Radius of gyration (Rg) values remain steady across all simulations, confirming that the enzymes retained their compact folded states without significant expansion or collapse.
  • Flexibility Patterns — RMSF analysis reveals expected regional variations in flexibility. High RMSF values in specific regions correspond well to known structural features (loops, flexible linkers) and do not indicate instability. Catalytic and structural core regions exhibit appropriately low fluctuations.
  • Solvent Exposure — SASA values remain stable, with no indication of partial unfolding or excessive surface exposure that would suggest structural deterioration.
  • Internal Packing — Hydrogen bond counts remain consistent throughout all simulations, supporting maintained core integrity and proper folding.
  • Simulation Conditions — Temperature and pressure traces show smooth, stable behavior without runaway excursions, confirming proper thermostat and barostat coupling. Energy fluctuations remain zero-centered with no significant drift.

Limitations: Due to the limited 17 ns simulation time, extended conformational sampling and rare structural transitions could not be fully explored. However, the observed stability metrics provide confidence in the short-term structural integrity of these enzymes under the tested temperature conditions.


Parameterization Workflow (Reproducible)

  1. Choose (e.g., 298.15 K) and Arrhenius vs Q10.
  2. Fit or Q10 using pairs (25 °C, 36.85 °C) for FAR and WS/DGAT with available activity data or literature priors.
  3. Compute at each T from MD deltas vs 25 °C; clamp to .
  4. Update rates: ; optionally scale .
  5. Simulate ODEs at target temperature(s) and compare WE production trajectories.
  6. Sensitivity checks: vary weights ±50% to confirm qualitative robustness.

Methods (MD Reproducibility)

We used the following GROMACS pipeline (17 ns production; videos match the same length). Temperature set via V-rescale (equilibration) and Parrinello–Rahman barostat in production; AMBER99SB-ILDN + TIP3P; 0.15 M NaCl; 2 fs timestep; Verlet cutoffs; PME electrostatics; LINCS constraints on H-bonds.

Show MD scripts
bash
#!/usr/bin/env bash
set -euo pipefail

# ===================== Defaults (override via CLI) =====================
PDB_IN=""
RUN_NAME="enzyme_md"
FF="amber99sb-ildn"      # e.g., amber14sb, ff19SB (if installed)
WATER="tip3p"              # e.g., tip3p or OPC (pair with FF appropriately)
SALT_CONC="0.15"           # mol/L
TEMP_K="310"               # Kelvin
PADDING_NM="1.0"
BOXTYPE="dodecahedron"
MD_NS="50"                 # production length in ns
TRAJ_SKIP="10"             # keep every Nth frame for the viz traj
GPU_IDS="0,1"              # which GPUs to use
NTMPI="1"                  # 1 MPI rank using both GPUs is simplest
# ======================================================================

usage() {
  echo "Usage: $0 -i <enzyme.pdb> [-o RUN_NAME] [--ff FF] [--water MODEL] [--salt M] [--temp K] [--padding nm] [--boxtype TYPE] [--md-ns NS] [--skip N] [--gpus LIST] [--ntmpi N]"
  echo ""
  echo "Defaults:"
  echo "  -o ${RUN_NAME}  --ff ${FF}  --water ${WATER}  --salt ${SALT_CONC}  --temp ${TEMP_K}"
  echo "  --padding ${PADDING_NM}  --boxtype ${BOXTYPE}  --md-ns ${MD_NS}  --skip ${TRAJ_SKIP}"
  echo "  --gpus ${GPU_IDS}  --ntmpi ${NTMPI}"
  exit 1
}

# --------------------- CLI parsing (short + long) ----------------------
while [[ $# -gt 0 ]]; do
  case "$1" in
    -i|--input)           PDB_IN="$2"; shift 2 ;;
    -o|--output)          RUN_NAME="$2"; shift 2 ;;
    --ff|--forcefield)    FF="$2"; shift 2 ;;
    --water)              WATER="$2"; shift 2 ;;
    --salt)               SALT_CONC="$2"; shift 2 ;;
    --temp)               TEMP_K="$2"; shift 2 ;;
    --padding)            PADDING_NM="$2"; shift 2 ;;
    --boxtype)            BOXTYPE="$2"; shift 2 ;;
    --md-ns)              MD_NS="$2"; shift 2 ;;
    --skip)               TRAJ_SKIP="$2"; shift 2 ;;
    --gpus)               GPU_IDS="$2"; shift 2 ;;
    --ntmpi)              NTMPI="$2"; shift 2 ;;
    -h|--help)            usage ;;
    *) echo "Unknown option: $1"; usage ;;
  esac
done

[[ -z "${PDB_IN}" ]] && { echo "Error: -i/--input <enzyme.pdb> is required."; usage; }
[[ -f "${PDB_IN}" ]] || { echo "Input PDB not found: ${PDB_IN}"; exit 1; }

# Derived
STEPS_PER_NS=500000       # 2 fs timestep -> 500k steps per ns
MD_NSTEPS=$(( STEPS_PER_NS * MD_NS ))

# -------------------- Environment configuration -----------------
export CUDA_VISIBLE_DEVICES="${GPU_IDS}"
export OMP_NUM_THREADS=${OMP_NUM_THREADS:-4}
export GMX_GPU_DD_COMMS=1
export GMX_GPU_PME_PP_COMMS=1

# ----------------------------- Setup -----------------------------------
WD="${RUN_NAME}"
mkdir -p "${WD}"
cd "${WD}"
# copy PDB into run dir (works for absolute or relative input)
cp -f "${PDB_IN}" ./enzyme.pdb

echo "=== Run name: ${RUN_NAME}"
echo "FF: ${FF} | Water: ${WATER} | Salt: ${SALT_CONC} M | Temp: ${TEMP_K} K"
echo "Box: ${BOXTYPE}, pad ${PADDING_NM} nm | MD: ${MD_NS} ns (${MD_NSTEPS} steps)"
echo "GPUs: ${GPU_IDS} | ntmpi: ${NTMPI} | traj skip: ${TRAJ_SKIP}"

# -------------------------- Write MDP files ----------------------------
cat > ions.mdp <<'EOF'
integrator = steep
nsteps     = 5000
emtol      = 1000
cutoff-scheme = Verlet
coulombtype   = PME
rcoulomb      = 1.0
rvdw          = 1.0
pbc           = xyz
EOF

cat > minim.mdp <<'EOF'
integrator  = steep
nsteps      = 50000
emtol       = 500
cutoff-scheme = Verlet
coulombtype = PME
rcoulomb    = 1.0
rvdw        = 1.0
pbc         = xyz
EOF

cat > nvt.mdp <<EOF
define                  = -DPOSRES
integrator              = md
dt                      = 0.002
nsteps                  = 50000        ; 100 ps
constraints             = h-bonds
constraint_algorithm    = lincs
cutoff-scheme           = Verlet
rcoulomb                = 1.0
rvdw                    = 1.0
coulombtype             = PME
pme_order               = 4
fourierspacing          = 0.12
nstlist                 = 40
tcoupl                  = V-rescale
tc-grps                 = Protein Non-Protein
tau_t                   = 0.5 0.5
ref_t                   = ${TEMP_K} ${TEMP_K}
pcoupl                  = no
nstxout-compressed      = 5000
nstenergy               = 1000
nstlog                  = 1000
EOF

cat > npt.mdp <<EOF
define                  = -DPOSRES
integrator              = md
dt                      = 0.002
nsteps                  = 100000       ; 200 ps
constraints             = h-bonds
constraint_algorithm    = lincs
cutoff-scheme           = Verlet
rcoulomb                = 1.0
rvdw                    = 1.0
coulombtype             = PME
pme_order               = 4
fourierspacing          = 0.12
nstlist                 = 40
tcoupl                  = V-rescale
tc-grps                 = Protein Non-Protein
tau_t                   = 0.5 0.5
ref_t                   = ${TEMP_K} ${TEMP_K}
pcoupl                  = C-rescale
pcoupltype              = isotropic
tau_p                   = 5.0
ref_p                   = 1.0
compressibility         = 4.5e-5
refcoord_scaling        = com
nstxout-compressed      = 5000
nstenergy               = 1000
nstlog                  = 1000
EOF

cat > md.mdp <<EOF
integrator              = md
dt                      = 0.002
nsteps                  = ${MD_NSTEPS}
constraints             = h-bonds
constraint_algorithm    = lincs
cutoff-scheme           = Verlet
rcoulomb                = 1.0
rvdw                    = 1.0
coulombtype             = PME
pme_order               = 4
fourierspacing          = 0.12
nstlist                 = 40
tcoupl                  = V-rescale
tc-grps                 = Protein Non-Protein
tau_t                   = 0.5 0.5
ref_t                   = ${TEMP_K} ${TEMP_K}
pcoupl                  = Parrinello-Rahman
pcoupltype              = isotropic
tau_p                   = 5.0
ref_p                   = 1.0
compressibility         = 4.5e-5
nstxout-compressed      = 5000
nstenergy               = 1000
nstlog                  = 1000
DispCorr                = EnerPres
EOF

# ------------------------------ Run ------------------------------------
echo "1) pdb2gmx..."
gmx pdb2gmx -f enzyme.pdb -o processed.gro -p topol.top -i posre.itp -water "${WATER}" -ff "${FF}" -ignh <<EOF
1
EOF

echo "2) Box..."
gmx editconf -f processed.gro -o boxed.gro -c -d "${PADDING_NM}" -bt "${BOXTYPE}"

echo "3) Solvate..."
gmx solvate -cp boxed.gro -cs spc216.gro -o solvated.gro -p topol.top

echo "4) Ions (pre-grompp, allow charge warning)..."
gmx grompp -f ions.mdp -c solvated.gro -p topol.top -o ions.tpr -maxwarn 1

echo "4b) genion: neutralize + ${SALT_CONC} M..."
printf "SOL\n" | gmx genion -s ions.tpr -o ionized.gro -p topol.top \
  -pname NA -nname CL -neutral -conc "${SALT_CONC}"

echo "5) Energy minimization..."
gmx grompp -f minim.mdp -c ionized.gro -p topol.top -o em.tpr
gmx mdrun -deffnm em -ntmpi 1

echo "6) NVT (restrained @ ${TEMP_K} K)..."
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -deffnm nvt -nb gpu -pme gpu -bonded gpu -update gpu -ntmpi "${NTMPI}" -gpu_id "${GPU_IDS}"

echo "7) NPT (restrained, C-rescale)..."
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -p topol.top -o npt.tpr
gmx mdrun -deffnm npt -nb gpu -pme gpu -bonded gpu -update gpu -ntmpi "${NTMPI}" -gpu_id "${GPU_IDS}"

echo "8) Production MD (PR barostat)..."
gmx grompp -f md.mdp -c npt.gro -p topol.top -o md.tpr
# --- multi-GPU settings ---
NTMPI=2                                 # one MPI rank per GPU
LOGICAL_CORES=$(nproc)
NTOMP=$(( LOGICAL_CORES / NTMPI ))      # threads per rank (round down)
export OMP_NUM_THREADS=${NTOMP}

gmx mdrun -deffnm md \
  -nb gpu -pme gpu -bonded gpu -update gpu -dlb yes \
  -pin on \
  -ntmpi ${NTMPI} -ntomp ${NTOMP} -npme 1 \
  -gpu_id ${GPU_IDS} \
  -v

echo "9) Quick checks..."

FF=amber99sb-ildn  WATER=tip3p  SALT=0.15 M
TEMP_K=298.15 or 310  DT=2 fs  Prod=17 ns (videos truncated accordingly)
Thermostat: V-rescale (equil); Barostat: Parrinello–Rahman (prod)
bash
printf "Temperature\nPressure\nDensity\n0\n" | gmx energy -f npt.edr -o npt_TPD.xvg
gmx rms -s md.tpr -f md.xtc -o rmsd.xvg -tu ns <<EOF
Backbone
Backbone
EOF

echo "10) Trajectory for VMD (unwrap+center, skip ${TRAJ_SKIP})..."
analyze
$ cat analyze_md.sh 
et -euo pipefail

# --- Config (edit if filenames differ) ---
TPR="md.tpr"
XTC="md.xtc"
EDR="md.edr"

# --- Sanity checks ---
for f in "$TPR" "$XTC" "$EDR"; do
	  [[ -f "$f" ]] || { echo "ERROR: Missing $f"; exit 1; }
  done

  echo "==> GROMACS version:"
  gmx --version | sed -n '1,12p' || true
  echo

  # --- 0) Make clean, analysis-friendly trajectory (unwrap + center on protein) ---
  echo "==> Building selection index files..."
  gmx select -s "$TPR" -select 'group "Protein"'                       -on protein.ndx
  gmx select -s "$TPR" -select 'backbone'                              -on backbone.ndx
  gmx select -s "$TPR" -select 'name CA and group "Protein"'           -on ca.ndx

  echo "==> Removing PBC, centering on protein..."
  # Using group 0 ("selection") from protein.ndx for both questions
  echo -e "0\n0" | gmx trjconv -s "$TPR" -f "$XTC" -o md_noPBC.xtc -pbc mol -center -ur compact -n protein.ndx

  # --- 1) Energies/thermo (robust: try by-name first, then fallback to common indices) ---
  echo "==> Extracting energies (Temperature, Pressure, Potential, Kinetic, Total Energy)..."
  ENERG_OUT="energies.xvg"
  rm -f "$ENERG_OUT"

	      echo "    Name-based selection failed; trying common indices fallback 11 12 13 15 17..."
	        # Adjust these if the menu differs; run `gmx energy -f md.edr` once interactively to confirm.
		  printf "11\n12\n13\n15\n17\n" | gmx energy -f "$EDR" -o "$ENERG_OUT" >/dev/null

    echo "    Wrote $ENERG_OUT"

    # --- 2) RMSD (backbone vs first frame) ---
    echo "==> RMSD (backbone)..."
    # backbone.ndx has the selection in group 0
    echo -e "0 0" | gmx rms -s "$TPR" -f md_noPBC.xtc -n backbone.ndx -o rmsd_backbone.xvg -tu ns -fit rot+trans

# --- 3) RMSF (Cα per-residue) ---
echo "==> RMSF (Cα)..."
echo 0 | gmx rmsf -s "$TPR" -f md_noPBC.xtc -n ca.ndx -o rmsf_ca_residue.xvg -res

# --- 4) Radius of gyration (protein compactness) ---
echo "==> Radius of gyration..."
echo 0 | gmx gyrate -s "$TPR" -f md_noPBC.xtc -n protein.ndx -o gyrate_protein.xvg

# --- 5) Secondary structure (DSSP) ---
echo "==> DSSP (secondary structure)..."
DSSP_BIN=""
if command -v mkdssp >/dev/null 2>&1; then
  DSSP_BIN="mkdssp"
elif command -v dssp >/dev/null 2>&1; then
  DSSP_BIN="dssp"
fi

if [[ -n "$DSSP_BIN" ]]; then
  export DSSP="$DSSP_BIN"
  # gmx dssp (2024.x) writes per-residue SS letters to -o (.dat) and summary counts to -num
  gmx dssp -s "$TPR" -f md_noPBC.xtc -o dssp.dat -num ss_count.xvg
  echo "    Wrote dssp.dat (per-residue SS) and ss_count.xvg (helix/sheet/coil counts)."
else
  echo "    WARNING: DSSP binary not found (mkdssp/dssp). Skipping secondary-structure step."
  echo "             Install via: conda install -c conda-forge dssp    (then re-run)"
fi

# --- 6) SASA (protein total) ---
echo "==> SASA (protein total)..."
# gmx sasa asks for (1) surface group and (2) output group. Use 'selection' (index 0) from protein.ndx for both.
echo -e "0\n0" | gmx sasa -s "$TPR" -f md_noPBC.xtc -n protein.ndx -o sasa_total.xvg

# --- 7) Intra-protein hydrogen bonds ---
echo "==> Protein-protein hydrogen bonds..."
# gmx hbond asks for donor and acceptor groups; use protein for both
echo -e "0\n0" | gmx hbond -s "$TPR" -f md_noPBC.xtc -n protein.ndx -num hbnum_protein.xvg

echo
echo "==> Done."
echo "Outputs:"
ls -1 \
  md_noPBC.xtc \
  energies.xvg \
  rmsd_backbone.xvg \
  rmsf_ca_residue.xvg \
  gyrate_protein.xvg \
  2>/dev/null || true

[[ -f dssp.dat ]] && echo "  dssp.dat" || true
[[ -f ss_count.xvg ]] && echo "  ss_count.xvg" || true
[[ -f sasa_total.xvg ]] && echo "  sasa_total.xvg" || true
[[ -f hbnum_protein.xvg ]] && echo "  hbnum_protein.xvg" || true

echo
echo "Quick read: RMSD plateaus, Rg steady, SS fractions stable, SASA steady, H-bonds not collapsing."

# Outputs: energies.xvg, rmsd_backbone.xvg, rmsf_ca_residue.xvg, gyrate_protein.xvg,
#          sasa_total.xvg, hbnum_protein.xvg, md_noPBC.xtc

References

Loading references...