Model

Synthetic biology moves fast - our model helps us move faster

Mathematical models are used by scientists across the globe and disciplines to guide their work, get intuition and explore otherwise inaccessible spaces. Latest after the valuable meetings with Prof. Dr. Wolfgang Wiechert (see here), it became clear that modeling our chemostat production process could greatly enhance our understanding and control of the system.

Our Model allows us to avoid costly trial-and-error experiments by testing ideas in silico before burning through lab resources on designs that won't work. Further it accelerates the speed: we can explore hundreds of parameter combinations in minutes, finding sensible combinations without even setting foot in a lab. This is how you get to big-scale experiments. The model helps us predict inducer effects so we know exactly what concentration will give us the response we want, without guessing or pipetting marathons.

Aim

The model aims to simulate and optimize inducer input on the production of vanillin by E. coli Marionette (DH-10 beta strain) in a chemostat system. Of special interest is the activity of the enzyme FCS (feruloyl-CoA synthase), that converts the precursor ferulic acid into feruloyl-CoA, which is then further hydrolyzed into vanilic acid. The model provides a first orientation for parameters as well as minimizes wasteful and expensive chemostat runs by showing reasonable parameter combinations.

1. Chemostat Model

The core of the model are differential equations describing the change of different state variables, namely:

w: Biomass
c: Substrate concentration
p: Product concentration
I: Inducer concentration
µ: Growth rate over time
p_r: Productivity of the pathway

Upon discussion with Prof. Dr. Wolfgang Wiechert, we decided to also include the parameter of productivity. See the equations below.

Figure 1: Equations for the chemostat model

Interactive Chemostat Simulator

Try the simulation for yourself and change parameters to see their influence!

Model Parameters

For all constant parameters refer to Table 1. The parameters are chosen based on literature for vanillin production, but can of course be altered for future simulations or other target metabolites:

Table 1: Parameters for chemostat simulation

Variable Name	Value	Unit	Description	Reference
K	5	g biomass/L	Carrying capacity, max biomass concentration	Typical capacity
µ_max	1.14	1/h	Maximum growth rate	Selvarasu et al., 2009
α_ind	0.01	dimensionless	Inhibition of growth through product formation	Model assumption
D	0.1	1/h	Dilution rate	Typical dilution rate
c_r	5	g/L	Substrate concentration in reservoir	Model assumption
s	0.3	g substrate/g biomass	Stoichiometry of conversion: substrate consumed per biomass formed	Vasilakou et al., 2020
V	0.2	L	Volume of chemostat	Based on design
k_syn	0.02	1/h	Enzyme FCS synthesis rate	Yang et al., 2013
n	2	dimensionless	Hill coefficient	Typical for inducible systems
K_i	0.12	g/L	Inducer concentration at half maximal enzyme activity	Ni et al., 2015
k_deg	0.001	1/h	Enzyme degradation rate	Model assumption
k_prod	0.15	g product/(g biomass·h)	Rate of product formation	Ni et al., 2015
I_r	0.5	g/L	Concentration of inducer in reservoir	Model assumption
k_cons	0.3	1/h	Rate of inducer consumption	Model assumption
k_basal	0.002	g product/(g biomass·h)	Basal product formation - not inducer dependent	Ni et al., 2015

One exemplary simulation

A 100-hour simulation using fixed parameters (Table 1) and µ_max = 1.14 [1/h], k_prod = 0.15 [gproduct/gbiomass* h] (based on Ni et al., 2015) and k_cons = 0.3 [1/h] (model assumption) with 5g/L inducer added every 10 hours, yielded the following results (Figure 2): Figure 2: 100-hour simulation of chemostat model showing different parameters over time: (A) Biomass, (B) Substrate concentration, (C) Production Pathway Enzyme activity, (D) Product concentration, (E) Inducer concentration, (F) Productivity and (G) Growth rate and Product yield.

No deviations are observed before 10 hours, as the first inducer addition occurs at this timepoint. Biomass concentration reaches approximately 4.5 g/L under both conditions, indicating that inducer supplementation does not significantly affect final biomass levels. This is further supported by the similarity in growth rates between the two scenarios.

Substrate concentration follows an identical trend in both cases, decreasing from 5 g/L to approximately 3.75 g/L over the course of the 100-hour simulation.

The production pathway enzyme activity, introduced here as a measure of FCS activity, is consistently higher in the inducer condition, although it exhibits variability. Activity levels remain above those observed without inducer addition.

Product concentration is increased under inducer addition, with the profiles showing characteristic spike behavior corresponding to the dosing timepoints. However, productivity is highly variable, with large deviations compared to the no-inducer scenario, including periods of negative productivity.

Overall, this suggests that simply adding a fixed amount of inducer at arbitrary timepoints does not necessarily improve system performance. Instead, it creates fluctuations that are not present in the control, highlighting the need for more carefully tuned or dynamic induction strategies. Further, up until a certain point, adding an inducer will always increase yields, but this strategy is not economical, ignoring the often expensive chemicals and biological limitations. An too high expression of heterologous enzymes will increase metabolic burden for the cells and could thereby lead to a destabilization of the whole system (Snoeck et al., 2024). While this additional stress is represented by α_ind, one simple parameter does not represent the complexity of the underlying processes.

All these considerations lead us to an optimization problem: by optimizing the spiking times and regulating the dosage per spike individually, we can increase productivity, decrease costs resulting from inducer addition, and avoid unnecessary stress for the cells.

2. Optimization of Inducers - How can we maximize the yields, with minimal inducer input?

To maximise productivity while minimizing the usage of inducers, we introduced an objective function:

objective = cost_inducer × total_{inducer amount} − profit_product × total_productivity

The cost inducer and profit product weights are dimensionless parameters, which are weighting the inducer amount and productivity according to a mix of factors. These include economical aspects (“How much do the chemicals cost? What is the profit I get by “selling” a unit of the product”) as well as biological limitations (“How much does the additional inducer stress the cell?”, “Does that lead to (initial) lower product?” and “(What) are long-term effects?”). Most important here is the ratio of the weights, which defines the optimization trade-off between cost and productivity. A higher ratio prioritizes economical and physiological sustainability by penalizing excessive inducer use, while a lower ratio favors maximizing yield even at higher inducer expense or biological burden. In practice, this ratio reflects the strategic choice between short-term production gains and long-term process efficiency.

The optimization was done using the Bayesian optimization method. Bayesian Optimization builds a probabilistic model of the objective function by first guessing parameters and then creating a map based on these parameters. The map is explored in a balance of testing uncertain regions that might reveal new optima (exploration) and refining conditions that already appear promising (exploitation) (Akiba et al., 2019). This makes it very robust and converges quickly.

Different cost and profit values naturally affect the optimization results. To simplify, we ran the optimization using a cost value of 2 and a profit value of 1 for comparing set spikes from the simulation (spike every 10hs with dosage 5) to the optimized parameters, the results are displayed in Figure 3. We can see a noticeable increase for product concentration for higher number of spikes and also a smoother progression. To directly compare the effect of the optimizer, one should look at the blue and the solid grey lines. They both show 10 spikes with a dosage of 5 each. Especially for the first half of the simulation, the productivity using optimized parameters (blue) exceeds the set times (grey). Figure 3: Productivity over time for different bounds, using optimized parameters

Figure 4 displays how productivity is affected by the cost of the inducer. These simulations were performed using Bayesian optimization with a maximum of 25 spikes and a dosage of 8 while keeping the profit constant at 1, allowing to make statements that not only regard the cost penalty itself, but also the ratio of cost and profit. The figure illustrates how cost parameters shape the optimized induction strategy.
Figure 4: Productivity over time for different costs, using optimized parameters

Code

If you are interested, check out the code and try the optimization for yourself!

Requirements

Store this in a requirements.txt file in the root directory.

numpy==1.26.4
scipy==1.14.1
matplotlib==3.8.0
optuna==4.5.0

Then execute the command:

pip install -r requirements.txt

Model Implementation

import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import optuna
import datetime

# State variables:
# w = biomass [g/L]
# c = substrate concentration [g/L]
# Pr = Production enzyme activity [dimensionless]
# P = Product concentration [g/L]
# I = Inducer concentration [g/L]
# mu = growth rate [1/h]

def chemostat(y, t, K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_t, spike_am):
    """
    Chemostat differential equation system.
    
    Parameters:
    -----------
    y : array
        State variables [w, c, Pr, P, I, mu]
    t : float
        Time point
    ... (other parameters as defined in table)
    """
    w, c, Pr, P, I, mu = y  # state variables
    
    # Inducer spike
    spike = 0
    for ts in spike_t:
        if abs(t - ts) < 0.1:
            spike += spike_am[0]

    # Differential equations
    didt = D * (Ir - I) - kcons * I + spike
    mu = mu_max * (1 - w / K) - (alpha_ind * Pr)
    dwdt = w * (mu - D)
    dcdt = D * (cr - c) - s * mu * w
    dprdt = ksyn * (I**n / (Ki**n + I**n)) - kdeg * Pr - D * Pr
    dpdt = kbase * w + kprod * Pr * w - D * P

    return [dwdt, dcdt, dprdt, dpdt, didt, mu]


# Parameters - example values 
K = 5          # carrying capacity, [g biomass/L]
mu_max = 1.14  # max growth rate, [1/hr]
alpha_ind = 0.1  # inhibition of growth through product formation/inducer addition
D = 0.1        # dilution rate, [1/h]
cr = 5         # substrate concentration in reservoir, [g/L]
s = 0.3        # stoichiometry of conversion, [g substrate/g biomass]
v = 0.2        # volume of chemostat [L] 
ksyn = 0.02    # synthesis of enzyme A, [1/h]
n = 2          # hill coefficient
Ki = 0.01      # inducer conc. at half maximal enzyme activity, [g/L]
kdeg = 0.001   # degradation of enzyme A, [1/h] 
kprod = 0.15   # rate of product formation, [g product/(g biomass·h)]
Ir = 0.05      # concentration of inducer in reservoir [g/L]
kcons = 0.3    # rate of inducer consumption, [1/h]
kbase = 0.002  # basal product formation, [g product/(g biomass·h)]

max_t = 100
t = np.linspace(0, max_t, 300)
xticks = np.arange(0, max_t + 1, 12)

start = [2, 5, 0, 0, 0, 0]  # initial conditions [biomass, substrate, enzyme, product, inducer, growth rate]

Simulation WITH Inducer

# Simulation WITH inducer
spike_t = [10, 20, 30, 40, 50, 60, 70, 80, 90]
print(len(spike_t))
spike_am = [5] * len(spike_t)
sol_spike = odeint(chemostat, start, t, args=(
    K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg,
    kprod, kcons, Ir, kbase, spike_t, spike_am))

# Extract results - inducer condition
w_spike = sol_spike[:, 0]   # biomass over time
Pr_spike = sol_spike[:, 2]  # pathway enzyme over time
P_spike = sol_spike[:, 3]   # product concentration (g/L)
c_spike = sol_spike[:, 1]   # substrate concentration (g/L)
t_hours = t                 # time array in hours

# Calculate growth rate mu(t)
mu_spike = mu_max * (1 - w_spike / K - alpha_ind * Pr_spike)

# Calculate substrate mass balance
S_i_spike = c_spike[0] * v  # initial substrate amount in vessel
S_fed_spike = cr * v * t_hours  # substrate fed in 
S_consumed_spike = S_i_spike + S_fed_spike - c_spike * v  # substrate consumed over time

# Avoid division by zero
safe_S_consumed_spike = np.where(S_consumed_spike <= 0, 1e-12, S_consumed_spike)

# Calculate product yield
y_path_spike = P_spike / safe_S_consumed_spike
net_change_spike = np.gradient(P_spike, t)

Simulation WITHOUT Inducer

# Simulation WITHOUT inducer 
no_spike_t = []
sol_no_spike = odeint(chemostat, start, t, args=(
    K, mu_max, alpha_ind, D, cr, s, v, ksyn, n, Ki, kdeg,
    kprod, kcons, Ir, kbase, no_spike_t, spike_am))
 
# Extract results - no inducer condition
P_ni = sol_no_spike[:, 3]   # product concentration
c_ni = sol_no_spike[:, 1]   # substrate concentration (g/L)
w_ni = sol_no_spike[:, 0]   # biomass over time
Pr_ni = sol_no_spike[:, 2]  # pathway enzyme over time

# Calculate growth rate mu(t)
mu_ni = mu_max * (1 - w_ni / K - alpha_ind * Pr_ni)

# Calculate substrate mass balance
S_i_ni = c_ni[0] * v  # initial substrate 
S_fed_ni = cr * v * t_hours  # substrate fed in 
S_consumed_ni = S_i_ni + S_fed_ni - c_ni * v  # substrate consumed over time

# Calculate product yield
y_path_ni = P_ni / S_consumed_ni
net_change_no_spike = np.gradient(P_ni, t)

# Calculate productivity: (kbase + kprod * Pr) * w = production/biomass 
productivity_spike = kbase * w_spike + kprod * Pr_spike * w_spike
productivity_no_spike = kbase * w_ni + kprod * Pr_ni * w_ni

Bayesian Optimization

# OPTIMIZATION PARAMETERS
max_number_spikes = 10  # change this if you want!
max_dose = 5            # change this if you want!
cost_inducer = 2
profit_product = 1

def objective_bayesian(trial, max_number_spikes, max_dose, cost_inducer, profit_product):
    """
    Objective function for Bayesian optimization.
    Minimizes cost while maximizing productivity.
    """
    # Suggest spike times and doses
    spike_t = [trial.suggest_float(f'spike_time_{i}', 0, max_t) for i in range(max_number_spikes)]
    spike_am = [trial.suggest_float(f'spike_dose_{i}', 0, max_dose) for i in range(max_number_spikes)]
    spike_t = sorted(spike_t)
    
    try:
        # Solve ODE system
        sol = odeint(chemostat, start, t, args=(
            K, mu_max, alpha_ind, D, cr, s, v,
            ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_t, spike_am))
        
        # Check for numerical issues
        if np.any(np.isnan(sol)) or np.any(np.isinf(sol)):
            return float('inf')
        
        # Extract results
        I = sol[:, 4]
        P = sol[:, 3] 
        c = sol[:, 1]
        w = sol[:, 0]
        Pr = sol[:, 2]
        
        # Calculate productivity and cost
        productivity = kbase * w + kprod * Pr * w
        total_productivity = np.trapz(productivity, t)
        total_inducer = np.sum(spike_am) + Ir * t[-1]
        
        # Objective: minimize cost, maximize productivity
        obj = (cost_inducer * total_inducer - profit_product * total_productivity) 
        return obj
        
    except Exception:
        return float('inf')

def wrapped_objective(trial):
    return objective_bayesian(trial, max_number_spikes, max_dose, cost_inducer, profit_product)

# Run optimization
study = optuna.create_study(direction='minimize')
study.optimize(wrapped_objective, n_trials=50)

# Extract optimized parameters
spike_time_op_bay = [study.best_params[f'spike_time_{i}'] for i in range(max_number_spikes)]
spike_dose_op_bay = [study.best_params[f'spike_dose_{i}'] for i in range(max_number_spikes)]

print("Optimized spike times:", spike_time_op_bay)
print("Optimized spike doses:", spike_dose_op_bay)

Visualization of Results

# SIMULATE with optimized parameters
optimal_solution_bay = odeint(chemostat, start, t, args=(
    K, mu_max, alpha_ind, D, cr, s, v,
    ksyn, n, Ki, kdeg, kprod, kcons, Ir, kbase, spike_time_op_bay, spike_dose_op_bay))

solver = "bayesian optimization"

P_bay = optimal_solution_bay[:, 3]   # product concentration
w_bay = optimal_solution_bay[:, 0]   # biomass over time
Pr_bay = optimal_solution_bay[:, 2]  # pathway enzyme over time

productivity_bay = kbase * w_bay + kprod * Pr_bay * w_bay

# Plot results
print(len(spike_dose_op_bay))
plt.plot(t, productivity_bay, label='Optimal spikes')
plt.plot(t, productivity_spike, label='With inducer spikes set', color='red')
plt.plot(t, productivity_no_spike, label='No inducer spikes', color='gray', linestyle='dashed')
plt.xlabel('Time (hr)')
plt.ylabel('Productivity [g/(L*h)]')
plt.suptitle("Optimized productivity", fontsize=12)
plt.title(f"Parameters: {max_number_spikes} maximal inducer spikes and {max_dose} maximal dose per spike,\n{len(spike_dose_op_bay)} actual spikes, solver: {solver}", fontsize=8.5)
plt.legend()
plt.grid(True)
plt.savefig(f"optimize_productivity_{solver}_{max_dose}_{max_number_spikes}", dpi=300, bbox_inches='tight')
plt.show()

This optimization approach provides an easily customizable method to determine the best parameters for your experiment - coupling our theoretical modeling to the hands-on approach for the hardware. We enable rapid translation from in silico predictions to experimental implementation, reducing unnecessary trial-and-error experiments.

References

Barghini, P., Di Gioia, D., Fava, F. et al. Vanillin production using metabolically engineered _Escherichia coli_under non-growing conditions. Microb Cell Fact 6, 13 (2007). https://doi.org/10.1186/1475-2859-6-13

Luziatelli, F., Brunetti, L., Ficca, A. G., & Ruzzi, M. (2019). Maximizing the Efficiency of Vanillin Production by Biocatalyst Enhancement and Process Optimization. Frontiers in bioengineering and biotechnology, 7, 279. https://doi.org/10.3389/fbioe.2019.00279

Ni, J., Tao, F., Du, H. et al. Mimicking a natural pathway for de novo biosynthesis: natural vanillin production from accessible carbon sources. Sci Rep 5, 13670 (2015). https://doi.org/10.1038/srep13670

Selvarasu S. et al., Characterizing Escherichia coli DH5alpha growth and metabolism in a complex medium using genome-scale flux analysis. Biotechnol Bioeng. 2009 Feb 15 102(3):923-34. doi: 10.1002/bit.22119. p.926

Snoeck, S., Guidi, C., & De Mey, M. (2024). "Metabolic burden" explained: stress symptoms and its related responses induced by (over)expression of (heterologous) proteins in Escherichia coli. Microbial cell factories, 23(1), 96. https://doi.org/10.1186/s12934-024-02370-9

Vasilakou E, van Loosdrecht MCM, Wahl SA. Escherichia coli metabolism under short-term repetitive substrate dynamics: adaptation and trade-offs. Microb Cell Fact. 2020;19(1):1-19. doi:10.1186/s12934-020-01379-0

Yang W, Tang H, Ni J, Wu Q, Hua D, Tao F, Xu P. Characterization of two Streptomyces enzymes that convert ferulic acid to vanillin. PLoS One. 2013 Jun 28;8(6):e67339. doi: 10.1371/journal.pone.0067339. PMID: 23840666; PMCID: PMC3696112.

Synthetic biology moves fast - our model helps us move faster​

Aim​

1. Chemostat Model​

Interactive Chemostat Simulator​

Model Parameters​

One exemplary simulation​

2. Optimization of Inducers - How can we maximize the yields, with minimal inducer input?​

Code​

References​