Model
Synthetic biology moves fast - our model helps us move faster
Mathematical models are used by scientists across the globe and disciplines to guide their work, get intuition and explore otherwise inaccessible spaces. Latest after the valuable meetings with Prof. Dr. Wolfgang Wiechert (see here), it became clear that modeling our chemostat production process could greatly enhance our understanding and control of the system.
Our Model allows us to avoid costly trial-and-error experiments by testing ideas in silico before burning through lab resources on designs that won't work. Further it accelerates the speed: we can explore hundreds of parameter combinations in minutes, finding sensible combinations without even setting foot in a lab. This is how you get to big-scale experiments. The model helps us predict inducer effects so we know exactly what concentration will give us the response we want, without guessing or pipetting marathons.
Aim
The model aims to simulate and optimize inducer input on the production of vanillin by E. coli Marionette (DH-10 beta strain) in a chemostat system. Of special interest is the activity of the enzyme FCS (feruloyl-CoA synthase), that converts the precursor ferulic acid into feruloyl-CoA, which is then further hydrolyzed into vanilic acid. The model provides a first orientation for parameters as well as minimizes wasteful and expensive chemostat runs by showing reasonable parameter combinations.
1. Chemostat Model
The core of the model are differential equations describing the change of different state variables, namely:
- w: Biomass
- c: Substrate concentration
- p: Product concentration
- I: Inducer concentration
- µ: Growth rate over time
- pr: Productivity of the pathway
Upon discussion with Prof. Dr. Wolfgang Wiechert, we decided to also include the parameter of productivity. See the equations below.
Figure 1: Equations for the chemostat model
Interactive Chemostat Simulator
Try the simulation for yourself and change parameters to see their influence!
Model Parameters
For all constant parameters refer to Table 1. The parameters are chosen based on literature for vanillin production, but can of course be altered for future simulations or other target metabolites:
Table 1: Parameters for chemostat simulation
| Variable Name | Value | Unit | Description | Reference |
|---|---|---|---|---|
| K | 5 | g biomass/L | Carrying capacity, max biomass concentration | Typical capacity |
| µmax | 1.14 | 1/h | Maximum growth rate | Selvarasu et al., 2009 |
| αind | 0.01 | dimensionless | Inhibition of growth through product formation | Model assumption |
| D | 0.1 | 1/h | Dilution rate | Typical dilution rate |
| cr | 5 | g/L | Substrate concentration in reservoir | Model assumption |
| s | 0.3 | g substrate/g biomass | Stoichiometry of conversion: substrate consumed per biomass formed | Vasilakou et al., 2020 |
| V | 0.2 | L | Volume of chemostat | Based on design |
| ksyn | 0.02 | 1/h | Enzyme FCS synthesis rate | Yang et al., 2013 |
| n | 2 | dimensionless | Hill coefficient | Typical for inducible systems |
| Ki | 0.12 | g/L | Inducer concentration at half maximal enzyme activity | Ni et al., 2015 |
| kdeg | 0.001 | 1/h | Enzyme degradation rate | Model assumption |
| kprod | 0.15 | g product/(g biomass·h) | Rate of product formation | Ni et al., 2015 |
| Ir | 0.5 | g/L | Concentration of inducer in reservoir | Model assumption |
| kcons | 0.3 | 1/h | Rate of inducer consumption | Model assumption |
| kbasal | 0.002 | g product/(g biomass·h) | Basal product formation - not inducer dependent | Ni et al., 2015 |
One exemplary simulation
A 100-hour simulation using fixed parameters (Table 1) and µmax = 1.14 [1/h], kprod = 0.15 [gproduct/gbiomass* h] (based on Ni et al., 2015) and kcons = 0.3 [1/h] (model assumption) with 5g/L inducer added every 10 hours, yielded the following results (Figure 2):
Figure 2: 100-hour simulation of chemostat model showing different parameters over time: (A) Biomass, (B) Substrate concentration, (C) Production Pathway Enzyme activity, (D) Product concentration, (E) Inducer concentration, (F) Productivity and (G) Growth rate and Product yield.
No deviations are observed before 10 hours, as the first inducer addition occurs at this timepoint. Biomass concentration reaches approximately 4.5 g/L under both conditions, indicating that inducer supplementation does not significantly affect final biomass levels. This is further supported by the similarity in growth rates between the two scenarios.
Substrate concentration follows an identical trend in both cases, decreasing from 5 g/L to approximately 3.75 g/L over the course of the 100-hour simulation.
The production pathway enzyme activity, introduced here as a measure of FCS activity, is consistently higher in the inducer condition, although it exhibits variability. Activity levels remain above those observed without inducer addition.
Product concentration is increased under inducer addition, with the profiles showing characteristic spike behavior corresponding to the dosing timepoints. However, productivity is highly variable, with large deviations compared to the no-inducer scenario, including periods of negative productivity.
Overall, this suggests that simply adding a fixed amount of inducer at arbitrary timepoints does not necessarily improve system performance. Instead, it creates fluctuations that are not present in the control, highlighting the need for more carefully tuned or dynamic induction strategies. Further, up until a certain point, adding an inducer will always increase yields, but this strategy is not economical, ignoring the often expensive chemicals and biological limitations. An too high expression of heterologous enzymes will increase metabolic burden for the cells and could thereby lead to a destabilization of the whole system (Snoeck et al., 2024). While this additional stress is represented by αind, one simple parameter does not represent the complexity of the underlying processes.
All these considerations lead us to an optimization problem: by optimizing the spiking times and regulating the dosage per spike individually, we can increase productivity, decrease costs resulting from inducer addition, and avoid unnecessary stress for the cells.
2. Optimization of Inducers - How can we maximize the yields, with minimal inducer input?
To maximise productivity while minimizing the usage of inducers, we introduced an objective function:
objective = costinducer × totalinducer amount − profitproduct × totalproductivity
The cost inducer and profit product weights are dimensionless parameters, which are weighting the inducer amount and productivity according to a mix of factors. These include economical aspects (“How much do the chemicals cost? What is the profit I get by “selling” a unit of the product”) as well as biological limitations (“How much does the additional inducer stress the cell?”, “Does that lead to (initial) lower product?” and “(What) are long-term effects?”). Most important here is the ratio of the weights, which defines the optimization trade-off between cost and productivity. A higher ratio prioritizes economical and physiological sustainability by penalizing excessive inducer use, while a lower ratio favors maximizing yield even at higher inducer expense or biological burden. In practice, this ratio reflects the strategic choice between short-term production gains and long-term process efficiency.
The optimization was done using the Bayesian optimization method. Bayesian Optimization builds a probabilistic model of the objective function by first guessing parameters and then creating a map based on these parameters. The map is explored in a balance of testing uncertain regions that might reveal new optima (exploration) and refining conditions that already appear promising (exploitation) (Akiba et al., 2019). This makes it very robust and converges quickly.
Different cost and profit values naturally affect the optimization results. To simplify, we ran the optimization using a cost value of 2 and a profit value of 1 for comparing set spikes from the simulation (spike every 10hs with dosage 5) to the optimized parameters, the results are displayed in Figure 3. We can see a noticeable increase for product concentration for higher number of spikes and also a smoother progression. To directly compare the effect of the optimizer, one should look at the blue and the solid grey lines. They both show 10 spikes with a dosage of 5 each. Especially for the first half of the simulation, the productivity using optimized parameters (blue) exceeds the set times (grey).
Figure 3: Productivity over time for different bounds, using optimized parameters
Figure 4 displays how productivity is affected by the cost of the inducer. These simulations were performed using Bayesian optimization with a maximum of 25 spikes and a dosage of 8 while keeping the profit constant at 1, allowing to make statements that not only regard the cost penalty itself, but also the ratio of cost and profit. The figure illustrates how cost parameters shape the optimized induction strategy.
Figure 4: Productivity over time for different costs, using optimized parameters
Code
If you are interested, check out the code and try the optimization for yourself!
Requirements
Store this in a requirements.txt file in the root directory.
numpy==1.26.4
scipy==1.14.1
matplotlib==3.8.0
optuna==4.5.0
Then execute the command:
pip install -r requirements.txt
Model Implementation
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import optuna
import datetime
# State variables:
# w = biomass [g/L]
# c = substrate concentration [g/L]
# Pr = Production enzyme activity [dimensionless]
# P = Product concentration [g/L]
# I = Inducer concentration [g/L]
# mu = growth rate [1/h]
def chemostat(y, t, K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_t, spike_am):
"""
Chemostat differential equation system.
Parameters:
-----------
y : array
State variables [w, c, Pr, P, I, mu]
t : float
Time point
... (other parameters as defined in table)
"""
w, c, Pr, P, I, mu = y # state variables
# Inducer spike
spike = 0
for ts in spike_t:
if abs(t - ts) < 0.1:
spike += spike_am[0]
# Differential equations
didt = D * (Ir - I) - kcons * I + spike
mu = mu_max * (1 - w / K) - (alpha_ind * Pr)
dwdt = w * (mu - D)
dcdt = D * (cr - c) - s * mu * w
dprdt = ksyn * (I**n / (Ki**n + I**n)) - kdeg * Pr - D * Pr
dpdt = kbase * w + kprod * Pr * w - D * P
return [dwdt, dcdt, dprdt, dpdt, didt, mu]
# Parameters - example values
K = 5 # carrying capacity, [g biomass/L]
mu_max = 1.14 # max growth rate, [1/hr]
alpha_ind = 0.1 # inhibition of growth through product formation/inducer addition
D = 0.1 # dilution rate, [1/h]
cr = 5 # substrate concentration in reservoir, [g/L]
s = 0.3 # stoichiometry of conversion, [g substrate/g biomass]
v = 0.2 # volume of chemostat [L]
ksyn = 0.02 # synthesis of enzyme A, [1/h]
n = 2 # hill coefficient
Ki = 0.01 # inducer conc. at half maximal enzyme activity, [g/L]
kdeg = 0.001 # degradation of enzyme A, [1/h]
kprod = 0.15 # rate of product formation, [g product/(g biomass·h)]
Ir = 0.05 # concentration of inducer in reservoir [g/L]
kcons = 0.3 # rate of inducer consumption, [1/h]
kbase = 0.002 # basal product formation, [g product/(g biomass·h)]
max_t = 100
t = np.linspace(0, max_t, 300)
xticks = np.arange(0, max_t + 1, 12)
start = [2, 5, 0, 0, 0, 0] # initial conditions [biomass, substrate, enzyme, product, inducer, growth rate]
Simulation WITH Inducer
# Simulation WITH inducer
spike_t = [10, 20, 30, 40, 50, 60, 70, 80, 90]
print(len(spike_t))
spike_am = [5] * len(spike_t)
sol_spike = odeint(chemostat, start, t, args=(
K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg,
kprod, kcons, Ir, kbase, spike_t, spike_am))
# Extract results - inducer condition
w_spike = sol_spike[:, 0] # biomass over time
Pr_spike = sol_spike[:, 2] # pathway enzyme over time
P_spike = sol_spike[:, 3] # product concentration (g/L)
c_spike = sol_spike[:, 1] # substrate concentration (g/L)
t_hours = t # time array in hours
# Calculate growth rate mu(t)
mu_spike = mu_max * (1 - w_spike / K - alpha_ind * Pr_spike)
# Calculate substrate mass balance
S_i_spike = c_spike[0] * v # initial substrate amount in vessel
S_fed_spike = cr * v * t_hours # substrate fed in
S_consumed_spike = S_i_spike + S_fed_spike - c_spike * v # substrate consumed over time
# Avoid division by zero
safe_S_consumed_spike = np.where(S_consumed_spike <= 0, 1e-12, S_consumed_spike)
# Calculate product yield
y_path_spike = P_spike / safe_S_consumed_spike
net_change_spike = np.gradient(P_spike, t)
Simulation WITHOUT Inducer
# Simulation WITHOUT inducer
no_spike_t = []
sol_no_spike = odeint(chemostat, start, t, args=(
K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg,
kprod, kcons, Ir, kbase, no_spike_t, spike_am))
# Extract results - no inducer condition
P_ni = sol_no_spike[:, 3] # product concentration
c_ni = sol_no_spike[:, 1] # substrate concentration (g/L)
w_ni = sol_no_spike[:, 0] # biomass over time
Pr_ni = sol_no_spike[:, 2] # pathway enzyme over time
# Calculate growth rate mu(t)
mu_ni = mu_max * (1 - w_ni / K - alpha_ind * Pr_ni)
# Calculate substrate mass balance
S_i_ni = c_ni[0] * v # initial substrate
S_fed_ni = cr * v * t_hours # substrate fed in
S_consumed_ni = S_i_ni + S_fed_ni - c_ni * v # substrate consumed over time
# Calculate product yield
y_path_ni = P_ni / S_consumed_ni
net_change_no_spike = np.gradient(P_ni, t)
# Calculate productivity: (kbase + kprod * Pr) * w = production/biomass
productivity_spike = kbase * w_spike + kprod * Pr_spike * w_spike
productivity_no_spike = kbase * w_ni + kprod * Pr_ni * w_ni
Bayesian Optimization
# OPTIMIZATION PARAMETERS
max_number_spikes = 10 # change this if you want!
max_dose = 5 # change this if you want!
cost_inducer = 2
profit_product = 1
def objective_bayesian(trial, max_number_spikes, max_dose, cost_inducer, profit_product):
"""
Objective function for Bayesian optimization.
Minimizes cost while maximizing productivity.
"""
# Suggest spike times and doses
spike_t = [trial.suggest_float(f'spike_time_{i}', 0, max_t) for i in range(max_number_spikes)]
spike_am = [trial.suggest_float(f'spike_dose_{i}', 0, max_dose) for i in range(max_number_spikes)]
spike_t = sorted(spike_t)
try:
# Solve ODE system
sol = odeint(chemostat, start, t, args=(
K, mu_max, alpha_ind, D, cr, s, v,
ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_t, spike_am))
# Check for numerical issues
if np.any(np.isnan(sol)) or np.any(np.isinf(sol)):
return float('inf')
# Extract results
I = sol[:, 4]
P = sol[:, 3]
c = sol[:, 1]
w = sol[:, 0]
Pr = sol[:, 2]
# Calculate productivity and cost
productivity = kbase * w + kprod * Pr * w
total_productivity = np.trapz(productivity, t)
total_inducer = np.sum(spike_am) + Ir * t[-1]
# Objective: minimize cost, maximize productivity
obj = (cost_inducer * total_inducer - profit_product * total_productivity)
return obj
except Exception:
return float('inf')
def wrapped_objective(trial):
return objective_bayesian(trial, max_number_spikes, max_dose, cost_inducer, profit_product)
# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(wrapped_objective, n_trials=50)
# Extract optimized parameters
spike_time_op_bay = [study.best_params[f'spike_time_{i}'] for i in range(max_number_spikes)]
spike_dose_op_bay = [study.best_params[f'spike_dose_{i}'] for i in range(max_number_spikes)]
print("Optimized spike times:", spike_time_op_bay)
print("Optimized spike doses:", spike_dose_op_bay)
Visualization of Results
# SIMULATE with optimized parameters
optimal_solution_bay = odeint(chemostat, start, t, args=(
K, mu_max, alpha_ind, D, cr, s, v,
ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_time_op_bay, spike_dose_op_bay))
solver = "bayesian optimization"
P_bay = optimal_solution_bay[:, 3] # product concentration
w_bay = optimal_solution_bay[:, 0] # biomass over time
Pr_bay = optimal_solution_bay[:, 2] # pathway enzyme over time
productivity_bay = kbase * w_bay + kprod * Pr_bay * w_bay
# Plot results
print(len(spike_dose_op_bay))
plt.plot(t, productivity_bay, label='Optimal spikes')
plt.plot(t, productivity_spike, label='With inducer spikes set', color='red')
plt.plot(t, productivity_no_spike, label='No inducer spikes', color='gray', linestyle='dashed')
plt.xlabel('Time (hr)')
plt.ylabel('Productivity [g/(L*h)]')
plt.suptitle("Optimized productivity", fontsize=12)
plt.title(f"Parameters: {max_number_spikes} maximal inducer spikes and {max_dose} maximal dose per spike,\n{len(spike_dose_op_bay)} actual spikes, solver: {solver}", fontsize=8.5)
plt.legend()
plt.grid(True)
plt.savefig(f"optimize_productivity_{solver}_{max_dose}_{max_number_spikes}", dpi=300, bbox_inches='tight')
plt.show()
This optimization approach provides an easily customizable method to determine the best parameters for your experiment - coupling our theoretical modeling to the hands-on approach for the hardware. We enable rapid translation from in silico predictions to experimental implementation, reducing unnecessary trial-and-error experiments.
References
Barghini, P., Di Gioia, D., Fava, F. et al. Vanillin production using metabolically engineered _Escherichia coli_under non-growing conditions. Microb Cell Fact 6, 13 (2007). https://doi.org/10.1186/1475-2859-6-13
Luziatelli, F., Brunetti, L., Ficca, A. G., & Ruzzi, M. (2019). Maximizing the Efficiency of Vanillin Production by Biocatalyst Enhancement and Process Optimization. Frontiers in bioengineering and biotechnology, 7, 279. https://doi.org/10.3389/fbioe.2019.00279
Ni, J., Tao, F., Du, H. et al. Mimicking a natural pathway for de novo biosynthesis: natural vanillin production from accessible carbon sources. Sci Rep 5, 13670 (2015). https://doi.org/10.1038/srep13670
Selvarasu S. et al., Characterizing Escherichia coli DH5alpha growth and metabolism in a complex medium using genome-scale flux analysis. Biotechnol Bioeng. 2009 Feb 15 102(3):923-34. doi: 10.1002/bit.22119. p.926
Snoeck, S., Guidi, C., & De Mey, M. (2024). "Metabolic burden" explained: stress symptoms and its related responses induced by (over)expression of (heterologous) proteins in Escherichia coli. Microbial cell factories, 23(1), 96. https://doi.org/10.1186/s12934-024-02370-9
Vasilakou E, van Loosdrecht MCM, Wahl SA. Escherichia coli metabolism under short-term repetitive substrate dynamics: adaptation and trade-offs. Microb Cell Fact. 2020;19(1):1-19. doi:10.1186/s12934-020-01379-0
Yang W, Tang H, Ni J, Wu Q, Hua D, Tao F, Xu P. Characterization of two Streptomyces enzymes that convert ferulic acid to vanillin. PLoS One. 2013 Jun 28;8(6):e67339. doi: 10.1371/journal.pone.0067339. PMID: 23840666; PMCID: PMC3696112.