Part 1: CytoFlow Architecture Introduction

Our dry lab constructed an end-to-end antimicrobial peptide intelligent development platform, CytoFlow, covering the entire process from molecular design, activity prediction, sequence optimization to production processes.

It consists of three major models: CytoEvolve (antimicrobial peptide evolution model, inputs AMP sequences and outputs improved variants), CytoGuard (antimicrobial peptide activity evaluation model, inputs AMP sequences and outputs MIC), and CytoGrow. The system framework is shown below:

Part 2: Experimental Level Design and Optimization

2.1 CytoGrow

CytoGuard and CytoEvolve can be considered models designed to improve LL-37 antimicrobial peptide quality.

CytoGrow, on the other hand, is a model aimed at increasing LL-37 yield, consisting of three major models: Grow-Medium (medium composition optimization model), Grow-Yeast (Saccharomyces cerevisiae growth kinetics model), and Grow-Glucose (glucose consumption model).

2.1.1 Grow-Medium

Abstract

Grow-Medium establishes a hybrid intelligent optimization framework of quadratic response surface + Gaussian process residuals + dual acquisition function Bayesian optimization for optimizing the culture medium formulation for Saccharomyces cerevisiae. We used two methods for optimization. The Mean method yielded an improved medium composition of glucose 54.49 g/L, peptone 9.82 g/L, KH2PO4 3 g/L, with predicted corresponding OD value of 0.408 (20.9% improvement over the basic medium result OD=0.3375), showing small improvement but high reliability. The UCB method predicted glucose 41.39 g/L, peptone 23.58 g/L, KH2PO4 3 g/L conditions to achieve OD value 0.424 (25.6% improvement over the basic medium result OD=0.3375), showing larger improvement but requiring further experimental validation.

Problem

Dataset Description

Source: Actual wet lab experimental data
Target variable: OD value (optical density, reflecting microbial growth)
Independent variables:
Glucose concentration (G): 20-70 g/L
Peptone concentration (T): 6-30 g/L
KH2PO4 concentration (K): 0-5 g/L
Data scale: 228 experimental points, 3 replicates each
Basic Medium OD: 0.3375(G=20, T=20, K=0)
Experimental Average OD: 0.3410
Highest Experimental OD: 0.401 (G=54, T=10, K=3)

Optimization Objective

Find the optimal concentration combination of glucose, peptone, and potassium dihydrogen phosphate to maximize OD value.

Method

Initially, we attempted a hybrid modeling approach of quadratic response surface + Gaussian process residuals.

Quadratic Response Surface (Trend Term)

f_{t r e n d} (G, T, K) = 𝐗 β

where feature matrix $𝐗$ contains:

𝐗 = [1, G, T, K, G^{2}, T^{2}, K^{2}, G T, G K, T K]

Parameter estimation: $β = (𝐗^{T} 𝐗 + λ 𝐈)^{- 1} 𝐗^{T} 𝐲$ , where $λ = 10^{- 8}$ is the regularization parameter.

Gaussian Process Residual Modeling (Rasmussen & Williams, 2006)

Residual definition:

r_{i} = y_{i} - f_{t r e n d} (G_{i}, T_{i}, K_{i})

Kernel function: Anisotropic RBF kernel

k (𝐱_{i}, 𝐱_{j}) = σ_{f}^{2} \exp (- \frac{1}{2} \sum_{d = 1}^{3} \frac{(x_{i, d} - x_{j, d})^{2}}{ℓ_{d}^{2}})

Hyperparameter settings:

$σ_{f} = max (10^{- 6}, std (r))$ (signal amplitude)
$σ_{n} = max (10^{- 6}, 0.02 \times range (y))$ (noise amplitude)
$ℓ_{1} = ℓ_{2} = ℓ_{3} = 1.0$ (length scales)

Prediction Formulas

Mean prediction:

μ (𝐱^{*}) = f_{t r e n d} (𝐱^{*}) + 𝐤_{*}^{T} (𝐊 + σ_{n}^{2} 𝐈)^{- 1} 𝐫

Variance prediction:

σ^{2} (𝐱^{*}) = k (𝐱^{*}, 𝐱^{*}) - 𝐤_{*}^{T} (𝐊 + σ_{n}^{2} 𝐈)^{- 1} 𝐤_{*}

Experiments & Results

For the initial optimization strategy, we used grid search, first defining the search space:

G ∈ [30, 70] g/L
T ∈ [6, 30] g/L
K = 3 g/L (fixed)
Grid resolution: 121×121 = 14,641 points

For the second method, we set the acquisition function: Upper Confidence Bound (UCB)

UCB (𝐱) = μ (𝐱) + κ σ (𝐱)

, where $κ = 2.58$ (99% confidence)

Results

Mean argmax @ (G=54.33, T=9.80, K=3): OD_mean=0.408 UCB argmax @ (G=41.33, T=23.60, K=3): OD_mean=0.424

Through grid search, both Mean and UCB found optimized configurations and predicted corresponding OD values.

The figures below show visualization of two extrema predicted by Mean and comparison of biomass(OD value) across different medium compositions:

The following figures show model validation and analysis: sensitivity analysis of the three parameters and model fitting quality analysis:

Conclusion

Through hybrid modeling combining parametric (quadratic response surface) and non-parametric (GP) methods, and dual acquisition functions considering both conservative optimistic (UCB) and deterministic exploitation (Mean) strategies, we optimized existing medium compositions and predicted their OD values. Optimal formulation discovered: Glucose 41.4g/L + Peptone 23.6g/L + KH2PO4 3g/L, predicted OD 0.424, 25.6% improvement over best experimental observation. Through modeling and computational prediction, we provided clear medium formulation recommendations for subsequent experiments, reducing trial-and-error costs. Moreover, this method is generalizable—the established optimization framework can be extended to other microbial medium optimization problems.