Overview

To gain a deeper understanding of the specific mechanisms of each stage in this project, verify its feasibility, predict outcomes, and collaborate with the wet lab team for mutual improvement, we established a series of mathematical and physical models, including the Stability Model for a Dual-Plasmid System, the Metabolic Burden Analysis Model, the Protein-Protein Docking Model, and the Infection and Inoculation Model.

Modeling work is closely tied to the project. The Stability Model for a Dual-Plasmid System simulates the loss of the heterologous dual-plasmid system in continuous culture, providing guidance for the wet lab team’ s plasmid selection. The Metabolic Burden Analysis Model assesses the impact of the introduced plasmids on the engineered bacteria. The Protein-Protein Docking Model demonstrates that the tail fiber modification is achievable, thereby bolstering the wet lab team’ s confidence in finalizing the experimental design. The Infection and Inoculation Model offers a reference for the expected dosage and killing efficacy, which helps us evaluate the cost-effectiveness of the project upon its implementation.

Figure 1. Overview of dry lab

Stability Model for a Dual-Plasmid System

Introduction

1.1 Background

Our project aims to develop a high-efficiency bacteriophage-based pesticide. The core of this endeavor is the construction of an engineered bacterium that relies on two plasmids sharing the same origin of replication (initially proposed as pUC19 and pUC57) to function. These plasmids carry different functional gene fragments required for phage production. However, during the subculturing of the engineered bacteria, plasmids can be lost from host cells for various reasons, a phenomenon known as plasmid segregation.

The loss of plasmids leads to the deletion of functional genes, resulting in a decline in the proportion of cells capable of producing complete phages within the bacterial population. This ultimately affects the yield and efficacy of the phage pesticide. Furthermore, cells that have lost one plasmid may experience a reduced metabolic load, allowing them to grow faster and outcompete the target cells carrying both plasmids.

To assess this risk before committing experimental resources, we developed a mathematical model to simulate the population dynamics of this dual-plasmid system over multiple generations of cultivation. The primary objective of this model is to predict the stability of such a competitive system. The model’ s predictions indicated that under various parameter conditions, the competitive effects between the two plasmids were significant, leading to a rapid decrease in the proportion of effective bacterial strains. Based on this forecast, the wet lab decided to mitigate this risk by optimizing the experimental protocol, ultimately choosing to use two identical pUC19 vectors to construct the system, thereby substantially eliminating the competition issue between plasmids.

Therefore, this model successfully guided the experimental design, demonstrating the predictive role of theoretical modeling in the early stages of a project. The analytical framework of this model remains applicable to the issue of plasmid loss in the revised protocol—albeit at a much slower rate—and can serve as a reference for determining the processing cycle of the engineered bacteria in practical applications.

1.2 Model Objectives

This model aims to analyze and predict the decay rate of the effective cell population (i.e., cells carrying both plasmids) over generations through mathematical modeling and computer simulation. The specific objectives are as follows:

To Guide Experimental Protocol Selection: By analyzing the impact of different plasmid loss probabilities (p) and the competitive growth advantage (g) of single-plasmid cells on the system, this model provides a reference for the wet lab to select plasmids that can maintain a high dual-plasmid retention rate within an acceptable number of generations. The simulation results for the heterologous dual-plasmid system demonstrated the necessity of avoiding plasmid competition and supported the team’ s final decision to adopt a homologous dual-vector system using pUC19.
To Optimize Production Cycle: For a given plasmid construct, we can use the model to determine the ideal number of cultivation generations before harvesting and lysing the engineered bacteria to meet specific yield requirements. This provides a standardized processing cycle for the industrial production of the phage pesticide, ensuring the yield and efficacy of the final biopesticide product. The model’ s analytical framework remains effective for addressing yield decay due to plasmid loss in the revised homologous dual-plasmid system. Therefore, it can still be used to estimate an ideal cultivation period for the finalized system to ensure that a majority of cells retain full functionality at the time of harvest.

Model Assumption

2.1 System State Definitions

To evaluate the initial protocol, we categorize the engineered bacterial population into four states based on their plasmid content:

$P_{both}$

Cells containing both pUC19 and pUC57 plasmids (effective cells).
$P_{pUC19}$

Cells containing only the pUC19 plasmid.
$P_{pUC57}$

Cells containing only the pUC57 plasmid.
$P_{neither}$

Cells that have lost both plasmids.

At any given generation t, the proportion of cells in these four states can be represented by a state vector $\mathbf{X}_t$:$\mathbf{X}_t=[P_{both,t},P_{pUC19,t},P_{pUC57,t},P_{neither,t}]^T$
At the initial state (generation 0), we assume all cells are effective, i.e. $\mathbf{X}_0=1^T$

2.2 Parameters

The model's analysis primarily revolves around the following two core parameters:

(Basal Probability of Plasmid Loss): Represents the average probability of a single plasmid being lost during each cell division. This is a variable parameter ranging from 10^-4 to 10^-1 to simulate plasmids with different stabilities.
(Competitive Advantage Coefficient of Single-Plasmid Cells): Represents the potential growth rate advantage of cells containing only one plasmid compared to those containing both. Plasmid replication and expression impose a metabolic burden on the host cell. Thus, cells that have lost one plasmid may grow slightly faster. The value of g in the model ranges from 0 (no advantage) to 0.2.

2.3 State Transition Equations

We use a Markov chain model to describe the change in population states over generations. The transition from generation t to t + 1 is primarily determined by plasmid loss events and cell growth competition.

Step 1: Plasmid Loss

Let $p_1$ be the loss probability of pUC19 and $p_2$ be the loss probability of pUC57. In a single cell division cycle, starting from a mother cell carrying both plasmids, the state probabilities of the daughter cells are as follows:

Still carrying both plasmids

$(1-p_1)(1-p_2)$
Losing pUC19, retaining only pUC57

$p_1(1-p_2)$
Losing pUC57, retaining only pUC19

$(1-p_1)p_2$
Losing both plasmids

$p_1 p_2$

Once a plasmid is lost, we assume it cannot be reacquired. Therefore, this process can be described by a 4 × 4 transition matrix P:

$\mathbf{P}=\begin{pmatrix}(1-p_1)(1-p_2)&(1-p_1)p_2&p_1(1-p_2)&p_1 p_2\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}$

Thus, the intermediate state vector $X _{t+1}$ considering only plasmid loss can be calculated as:

$\mathbf{X}_{t+1}=\mathbf{P}^T\cdot\mathbf{X}_t$

Step 2: Growth Competition

Next, we consider the growth advantage g of single-plasmid cells. After obtaining the intermediate state vector $X _{t+1}$ , we adjust the proportions of cells containing only pUC19 and only pUC57 to reflect their growth advantage:

$\mathbf{P}_{pUC19,t+1}''=\mathbf{P}'_{pUC19,t+1}\times(1+g)$

$\mathbf{P}_{pUC57,t+1}''=\mathbf{P}'_{pUC57,t+1}\times(1+g)$

The proportions of the other two states ( $P_{both}$ and $P_{neither}$ ) remain unchanged.

Step 3: Normalization

Since growth competition alters the relative proportions of the population, we need to normalize the vector to ensure that the sum of all state proportions is 1, thereby obtaining the final state distribution $X_{t+1}$ for generation t + 1:

$\mathbf{S}=\mathbf{P}'_{both,t+1}+\mathbf{P}_{pUC19,t+1}''+\mathbf{P}_{pUC57,t+1}''+\mathbf{P}'_{neither,t+1}$

$\mathbf{X}_{t+1}=\frac{1}{S}\cdot[\mathbf{P}'_{both,t+1},\mathbf{P}_{pUC19,t+1}'',\mathbf{P}_{pUC57,t+1}'',\mathbf{P}'_{neither,t+1}]^T$

2.4 Simulation Scenarios

To analyze the problem more comprehensively, three different plasmid loss scenarios are set up in the code:

In cases where there is a tendency to lose one type of plasmid, additional loss coefficients are multiplied to the respective plasmid loss probabilities. This represents the translation of differences in replication initiation probability into final loss probability, leading to a higher overall plasmid loss rate.

Symmetric Loss: Assumes both plasmids have identical stability.
- $p_1=p\times 0.5$
- $p_2=p\times 0.5$
Favor_pUC19 (Bias towards retaining pUC19): Assumes pUC19 is more stable than pUC57, thus having a lower loss probability.
- $p_1=p\times 0.4\times 0.5$
- $p_2=p\times 0.6\times 1.5$
Favor_pUC57 (Bias towards retaining pUC57): Assumes pUC57 is more stable than pUC19, thus having a lower loss probability.
- $p_1=p\times 0.6\times 1.5$
- $p_2=p\times 0.4\times 0.5$

Model Results

The model simulates population dynamics from generation 0 to 150 (numSteps) through iterative calculations and outputs the following results for different combinations of p and g.

After running the code, a series of plots and a CSV data file are generated and stored in the outputs_with_double_loss folder.

3.1 Dynamic Curve Plots

Figure 2: Plasmid Loss Dynamics Over 150 Generations.

This plot illustrates the simulated changes in the proportions of the four cell types under three different loss scenarios. The simulation parameters are set with a single plasmid loss rate (p) of 0.003 and a competitive advantage (g) of 0.02. The blue line, representing the effective “both-plasmid” population, demonstrates a significant decay over time.

These plots illustrate the simulated changes in the proportions of the four cell types over 150 generations. As expected, the proportion of cells carrying both plasmids steadily decreases.

3.2 3D Surface Plots

delta2_in_p0.5.png

Figure 3: System Stability as a Function of Plasmid Loss Rate and Competitive Advantage.

This 3D surface plot corresponds to the symmetric loss scenario. It visualizes the number of generations required for the proportion of the “both-plasmid” population to drop below 50%. The plot shows that system stability, measured in generations, decreases rapidly with higher plasmid loss rates (log10(p)) and greater competitive advantages (g).

To understand the combined effects of plasmid loss probability (p) and competitive advantage (g), we generated 3D surface plots. These plots show the number of generations required for the target population proportion to fall below a certain threshold (e.g., 50%). This provides a macroscopic perspective, helping us to understand the interrelationships between parameters and to quickly identify favorable parameter ranges.

3.3 2D Heatmaps

deltaG_in_p0.5.png

Figure 4: Generations to Reach 50% Effective Population Threshold at p=0.01.

This heatmap presents the number of generations required for the “both-plasmid” population to fall below the 50% threshold, given a fixed loss rate of p=0.01. The values in the cells correspond to different competitive advantages (g) and loss scenarios. For instance, in the symmetric loss scenario with a 2% growth advantage (g=0.02), it takes 47 generations for the proportion of effective cells to drop below 50%.

The most direct application of our model is to provide quantitative recommn-dations for our production workflow. By analyzing the simulation data, we can determine the “processing cycle”—the maximum number of generations the culture should be grown to maintain a high proportion of functional cells. We defined two key thresholds for the “both-plasmid” population: 50% and 30%.

To make the analysis more accessible, we generated 2D heatmaps for a fixed, realistic plasmid loss rate of p = 0.01. These heatmaps show the number of generations required to reach the 50% threshold under all scenarios and competitive advantage values.

3.4 Conclusion

Our mathematical model provides a quantitative framework for understanding and predicting the stability of our dual-plasmid system. The results indicate that both the intrinsic plasmid loss rate (p) and the competitive growth advantage of single-plasmid cells (g) are key factors determining the functional lifespan of the engineered bacterial culture.

The most significant contribution of this model lies in its predictive guidance. The simulation results forecasted that in our initial experimental design (using two different plasmids, pUC19 and pUC57), the population of effective dual-plasmid strains would rapidly decline due to plasmid competition and segregation, which would severely impact the final yield of the phage pesticide.

Based on this theoretical prediction, the wet lab promptly adjusted the research strategy and adopted a more stable homologous dual-plasmid system (i.e., both functional modules are carried on pUC19 vectors). This new approach substantially eliminated the competition arising from incompatible origins of replication, greatly enhancing the genetic stability of the system.

In conclusion, by prioritizing theoretical calculations, this model effectively saved time and costs associated with experimental trial-and-error, ensuring that the project progressed along a more reliable path.

Metabolic Burden Analysis Model

Introduction

1.1 Background

In our phage-like particle delivery system project, we produce key components for killing pathogenic bacteria by introducing recombinant plasmids into a chassis strain (E. coli MG1655). This system involves the replication and recombination of foreign plasmids, as well as the expression of their desired genes, all of which impose an additional metabolic burden on the host bacterium. This significantly affects the host’ s physiological state, for instance by:

Inhibiting growth: Cellular resources are diverted to plasmid replication and protein expression, leading to a decrease in the host’ s own growth rate.
Triggering stress responses: The increased load from processes like DNA replication and protein folding can activate the SOS response (marked by RecA) and the heat shock response (marked by GrpE), respectively, further impacting cellular functions.

1.2 Model Objectives

To quantitatively evaluate and predict the impact of different gene circuit designs on the host bacterium, we have constructed a mathematical model based on a system of ordinary differential equations (ODEs). This model aims to simulate the dynamic changes of key physiological indicators and metabolic burden within the host after the introduction of recombinant plasmids. The core objectives of this model are:

To analyze the sources and dynamics of metabolic burden: To distinguish and quantify the burden originating from DNA replication & DNA recombination (DNA synthesis burden, S_d) and protein expression (protein synthesis burden, S_p), and to observe their changes over time.
To predict observable biological metrics: The model can predict the expression levels of stress-related proteins (such as RecA and GrpE), providing a theoretical reference for the wet lab to validate the model and measure stress using methods like Western Blot.
To guide experimental design and component selection: Through parameter sensitivity analysis, we can compare the magnitude of the effects of various parameters (e.g., plasmid copy number, protein expression rate) on metabolic burden. This helps the wet lab optimize plasmid construction strategies and select components whose impact on the host is within an acceptable range.

Model Equations

This model includes 10 state variables describing the dynamic changes from the macroscopic population level to the microscopic molecular level. It incorporates Hill functions and Michaelis-Menten kinetics to depict the nonlinear regulatory relationships within the system, making it more representative of actual biological processes. The main parameters used in the model, their biological meanings, and their units are shown in the table below.

Variable	Meaning	Unit
$B$	Bacterial density	cells mL⁻¹
$C$	Average plasmid copy number per cell	plasmid cell⁻¹
$D$	Amount of recombinant gene per cell	nM
$P$	Protein synthesis rate	nM h⁻¹
$P_t$	Accumulated toxin protein concentration	nM
$M$	External stress level	-
$S_p$	Protein synthesis burden	-
$S_d$	DNA replication burden	-

Core Differential Equations:

1. Bacterial Growth Equation dB/dt

The change in bacterial population density B follows a logistic growth model, where r_eff is the effective growth rate.
The effective growth rate r_eff is inhibited by the total metabolic burden (S_p + S_d) and external stress (M).

$\frac{dB}{dt}=\mu_{calc}\cdot B\quad\text{where}\quad\mu_{calc}=r_{eff}\cdot(1-\frac{B}{K})$

$r_{eff}=\max(r_0-m_1(S_p+S_d)-m_2\cdot M,0)$

2. Average Plasmid Copy Number Equation dC/dt

The change in plasmid copy number C is composed of several parts: self-replication (logistic growth, limited by a maximum copy number C_max), consumption through recombination, dilution by cell division, and external intervention (e.g., temperature-sensitive inactivation).
f_rep, f_rec, and f_att are time-dependent piecewise functions that simulate the experimental process.

$\frac{dC}{dt}=\rho\cdot f_{rep}(t)\cdot C\cdot(1-\frac{C}{C_{max}})-\gamma\cdot f_{rec}(t)\cdot C-\mu_{calc}\cdot C-k_{att}\cdot f_{att}(t)\cdot C$

3. Recombinant Gene Amount Equation dD/dt

The recombinant gene D is produced from the plasmid C at a specific rate γ.

$\frac{dD}{dt}=\gamma\cdot f_{rec}(t)\cdot C$

4. Protein Synthesis Rate Equation dP/dt

The production of the protein synthesis rate P depends on the total gene amount (C and D) and exhibits Michaelis-Menten saturation kinetics. It is also subject to saturable inhibition by DNA burden S_d and external stress M.

$\frac{dP}{dt}=\alpha\cdot\frac{(f_{exp}(t)\cdot C+D)\cdot B}{K_{m\_trans}+(f_{exp}(t)\cdot C+D)\cdot B}\cdot(1-\frac{S_d}{K_{m\_P}+S_d})\cdot(1-\frac{M}{K_{m\_P}+M})-\delta_P\cdot P$

5. Accumulated Protein Equation dPt/dt

P_t is the simple time integral of P.

$\frac{dPt}{dt}=P$

6. External Stress Equation dM/dt

The dynamics of M are governed by external pulses (simulated by a Gaussian function) and its own exponential decay.

$\frac{dM}{dt}=\sigma_M\cdot\text{pulse}(t)-\delta_M\cdot M$

7. Protein Synthesis Burden dSp/dt and DNA Synthesis Burden dSd/dt

This is the core of the model. The production of S_p and S_d is induced by different intracellular activities, and they exhibit mutual inhibition, a regulation described by Hill functions, resulting in switch-like behavior.
- The production of S_p is activated by the protein synthesis rate P and external stress M.
- The production of S_d is activated by plasmid replication C and bacterial growth μ_calc * B.

$\frac{dS_p}{dt}=(k_3\frac{P^{n_{sp}}}{K_{sp}^{n_{sp}}+P^{n_{sp}}}+k_4\frac{M^{n_{sp}}}{K_{sp}^{n_{sp}}+M^{n_{sp}}})\cdot\frac{1}{1+(\frac{S_d}{K_{mutual}})^{h_{mutual}}}-\delta_{Sp}\cdot S_p$

$\frac{dS_d}{dt}=(k_6\frac{C^{n_{sd}}}{K_{sd}^{n_{sd}}+C^{n_{sd}}}+k_7\frac{(\mu_{calc}B)^{n_{sd}}}{K_{sd}^{n_{sd}}+(\mu_{calc}B)^{n_{sd}}})\cdot\frac{1}{1+(\frac{S_p}{K_{mutual}})^{h_{mutual}}}\cdot\frac{1}{1+\frac{M}{K_{sd}}}-\delta_{Sd}\cdot S_d$

Parameters

Parameter	Meaning	Value (Baseline)	Unit
$r_0$	Maximum bacterial growth rate	0.6	h⁻¹
$K$	Carrying capacity	10⁸	cells mL⁻¹
$m_1,m_2$	Inhibition coefficients of stress on growth	0.4, 0.2	-
$\rho$	Plasmid replication rate	0.5	h⁻¹
$C_{\text{max}}$	Maximum plasmid copy number per cell	100	plasmids cell⁻¹
$\gamma$	Plasmid recombination rate	0.1	h⁻¹
$\alpha$	Protein expression rate	10^-7	h⁻¹
$f_{\text{rep}},f_{\text{rec}},f_{\text{exp}}$	Fraction of plasmids for replication, recombination and expression	0.6, 0.3, 0.1	-
$f_{\text{att}}$	Fraction of plasmids for attenuation (post-induction)	1.0	-
$\lambda_{\text{att}}$	Plasmid attenuation rate constant	1.7325	h⁻¹
$t_{\text{int}}$	Time of external stimulus application	5	h
$\delta_M,\sigma_M,\omega$	Decay rate, intensity, and width of pulse	0.5, 1.0, 0.02	h⁻¹, dimensionless, h
$k_3,k_4$	Max production rate of ( $S_p$ ) from P and M	0.2, 0.1	h⁻¹
$k_6,k_7$	Max production rate of ( $S_d$ ) from C and growth	0.15, 0.10	h⁻¹
$K_{sp},K_{sd}$	Half-saturation constants for Hill functions	10³, 10	-, plasmids
$n_{sp},n_{sd},h$	Hill coefficients	2, 2, 2	-
$K_{\text{mutual}}$	Half-saturation constant for mutual inhibition	1	-
$St_{\text{max}}$	Upper limit for total burden ( $S_p+S_d$ )	1.6	-

Results

4.1 Baseline Simulation Results

Under baseline parameters, we conducted a 15-hour dynamic simulation of the system. The results show the changes of all 10 state variables over time.

Figure 5: System Dynamic Behavior Under Baseline Parameters.

This plot illustrates the dynamic changes of key state variables over a 15-hour simulation period. As the bacteria grow, the average plasmid copy number (C) first rises rapidly and then quickly declines due to simulated temperature-sensitive inactivation, demonstrating the system’ s response to environmental cues and internal state adjustments.

4.2 Parameter Scan and Sensitivity Analysis

To identify the key factors affecting the system’ s behavior, we performed a one-at-a-time (OAT) scan on 10 core parameters (e.g., r0, rho, alpha). This involves adjusting each parameter above and below its baseline value and observing the changes in the model’ s output.

Figure 6: Parameter Sensitivity Analysis on System State Variables.

This figure displays the response trajectory families for the model’ s state variables when 10 different core parameters are varied individually. The title of each subplot indicates the specific parameter being scanned. The bundle of curves in each plot represents the range of variation in a state variable’ s trajectory, with the degree of dispersion providing a visual indication of the system’ s sensitivity to that parameter.

Conclusion

Based on this model analysis, we draw the following conclusions:

The model successfully links the macroscopic growth of cells with the dynamics of microscopic metabolic burden and provides predictable metrics (RecA, GrpE) that can be experimentally validated.
The parameter sensitivity analysis indicates directions for optimization. For example, if experiments show that a strain grows slowly and has high GrpE levels, we should prioritize reducing the protein expression strength (by tuning alpha-related components) or selecting a chassis strain with better tolerance to S_p.
This model provides an accounting framework for evaluating and comparing the physiological cost of different plasmids on the host bacterium, which facilitates better and more convenient adjustment of experimental parameters in wet lab experiments.

Protein-Protein Docking Model

Introduction

Tail fiber replacement is a critical step that determines the success of our project. Its success directly dictates whether our designed Phage-Like Particles (PLP) can effectively broaden their host range, thereby enabling the application of our “phage-like factory” to a wider array of pathogenic bacteria control scenarios.

Through a literature review, we first clarified the structural composition of the T7 bacteriophage tail fibers. In T7, three proteins—gp11, gp12, and gp17—are closely associated. Among them, gp17 is the key protein determining host recognition and infection range, while gp11 and gp12 serve as a structural scaffold to anchor gp17 to the phage head, completing the overall assembly of the tail fibers.

We further identified two mainstream strategies for engineering phage host range through tail fiber modification:

Strategy 1: Whole Protein Replacement: This involves swapping the genes encoding complete tail fiber proteins between different phages to obtain phages with new host recognition capabilities. This strategy requires high homology in the structure and assembly mechanisms of the two phage tail fibers.

Strategy 2: Chimeric Protein Construction: This involves rationally designing and fusing tail fiber proteins from different sources to create a chimeric tail fiber with combined recognition properties. The core of this strategy lies in identifying suitable protein cleavage and fusion sites.

Based on these approaches, we proposed the following two experimental plans:

Plan 1: To perform a whole-protein replacement between the gp17 protein, which determines the host range of Escherichia phage T7, and the tail fiber protein VO98_215 from Pseudomonas phage phiPsa17, to verify the potential for expanding the host range across different genera.

Plan 2: A sequence alignment revealed an 83% positives between gp17 and VO98_215 in the N-terminal 1–149 amino acid region. Based on this, we proposed constructing a chimeric tail fiber protein using the 149th amino acid as the boundary to achieve precise replacement of the host recognition domain.

To assess the feasibility of these designs, we conducted protein-protein docking simulations to structurally verify the binding compatibility of the VO98_215 protein from Pseudomonas phage phiPsa17 with the gp11/gp12 proteins of T7 phage, as well as the high structural similarity between VO98_215 and gp17. This work established a theoretical foundation for the subsequent experiments conducted by the wet lab team.

Methods and Model

2.1 Tail Fiber Protein Structure Modeling and Comparison

To investigate the structural features of the tail fiber proteins, we first obtained the amino acid sequences of T7 phage gp17 and PhiPsa17 phage VO98_215 and used AlphaFold3 to predict the three-dimensional structures of their N-terminal 1–200 amino acid regions. The overall model confidence was high, indicating that the predicted structures are reliable.

Next, for the 1–149 amino acid segment, which exhibited high structural similarity, we used a Python script to extract the pLDDT confidence scores for this region. We then employed PyMOL for structural superposition and visualization, and calculated the root-mean-square deviation (RMSD) between the two structures to quantify their similarity.

2.2 Protein-Protein Docking Analysis

To evaluate the interaction between the VO98_215 protein from PhiPsa17 and the T7 phage tail proteins gp11/gp12, we obtained the crystal structure of the T7 phage gp11/gp12 complex from the RCSB PDB database and used it as the receptor protein. The predicted structure of VO98_215 was used as the ligand, and protein-protein docking was performed using ClusPro, which yielded multiple docking results.

2.3 Binding Interface Analysis and Visualization

From the docking results, we selected the conformation with the largest number of cluster members as the representative binding model and used the PRODIGY tool to analyze the binding affinity and key residues at the complex interface. Finally, we used PyMOL to visualize the binding site, analyzing the number and spatial distribution of interacting residues to provide a structural basis for subsequent experimental design.

Results

3.1 Alphafold Prediction Results

Figure 7 Predicted N-terminal structure of PhiPsa17 tail fiber protein(aa 1-200)

Figure 8 Predicted N-terminal structure for Phage T7 tail fiber (aa 1-200)

Figure 9 Superposition of the T7 tail fiber base(purple) to ΦPSA17 tail fiber bas(blue)

The AlphaFold structure prediction results show that the predicted confidence (pLDDT) for the N-terminal region (aa 1–149 ) of T7 gp17 and PhiPsa17 VO98_215 proteins were 84.15 and 87.83, respectively, indicating high reliability for the structural modeling of this region. Further structural superposition of this segment of the two proteins using PyMOL yielded a root-mean-square deviation (RMSD) of 1.48 Å for their Cα atoms, confirming their high similarity in spatial structure.

In contrast, the predicted confidence scores for the 150–200 amino acid segment near the C-terminus dropped to 43.09 and 66.87, respectively, suggesting that this region may lack a stable three-dimensional structure and could be an intrinsically disordered region. This finding is consistent with descriptions of the C-terminal structural features of this class of tail fiber proteins in existing literature, further supporting the rationale for selecting position 149 as the boundary for our chimera.

3.2Cluspro Prediction Results

Figure10 Cluspro Results and Predicted complex structure of PhiPSA17 VO98_215 and T7gp11/12

Based on the AlphaFold Prediction Results, we used ClusPro to perform protein-protein docking between the N-terminal structure of the PhiPsa17 tail fiber protein and the T7 phage gp11/12 complex. The ClusPro docking workflow consists of three key steps: first, it samples the ligand through approximately 70,000 rotations, translating the ligand on a grid at each rotation. From an initial 10⁹ conformations, it selects the 1,000 most energetically favorable ones based on its simplified physics-based scoring function. Subsequently, ClusPro performs a greedy clustering on these 1,000 best conformations, grouping structures within a 9Å C-α RMSD radius into the same cluster. The size of the cluster is used as the primary criterion for ranking the models—a larger cluster typically represents a more stable and reliable binding mode in the conformational space.

It is important to note that although ClusPro provides a “Piper” energy score for each model, its official documentation emphasizes that the main purpose of this score is to perform an initial, coarse-grained screening from the vast number of initial samples.

Recognizing this limitation of the built-in ClusPro score, we selected the model from the most populated cluster from the ClusPro docking results and used the more specialized PRODIGY tool to perform a professional binding affinity analysis, in order to more accurately assess the binding potential of the complex.

3.3 PRODIGY Analysis and PyMOL Visualization Results

Figure11 PRODIGY Analysis Results

Figure12 Docking of PhiPSA17 VO98_215 and T7gp11/12 complex

The PRODIGY binding affinity analysis results indicate a high degree of structural complementarity and a strong binding tendency between the N-terminal domain of the PhiPsa17 tail fiber protein and the T7 phage gp11/gp12 protein complex. This binding process is thermodynamically highly spontaneous, with a binding free energy as low as –17.6 kcal/mol, indicating a significant thermodynamic driving force for complex formation. Furthermore, its estimated dissociation constant at 25°C is 1.3 × 10^–13 M, reflecting an extremely high binding affinity and complex stability.

At the structural level, a total of 192 intermolecular contacts were identified at the interface, revealing an extensive and dense interaction network. These contacts include 62 non-polar–non-polar interactions, 49 charged–non-polar interactions, 18 charged–charged interactions, and 23 charged–polar interactions, demonstrating the diversity of chemical forces at the binding interface. In addition, analysis of the residue composition at the binding interface shows that non-polar residues account for 39.67% and charged residues for 26.62%. This distribution suggests that both hydrophobic effects and electrostatic interactions play important roles in maintaining the stability of the complex, collectively promoting the tight binding of the protein-protein interface.

As shown in the figure, the N-terminal domain of the PhiPsa17 tail fiber protein (pink) fits into the natural structural interface formed by the T7 phage gp11 (blue) and gp12 (green) proteins, forming extensive contacts with both. This binding mode displays excellent spatial complementarity, ensuring the stability of the complex. This result structurally confirms that the N-terminus of the PhiPsa17 tail fiber protein can effectively mimic the assembly mode of the native T7 tail fiber gp17, providing a basis for the construction of the chimeric tail fiber.

Conclusion

This model confirms that the VO98_215 protein of PhiPsa17 is not only highly similar to the N-terminus of T7 gp17 but also binds stably with T7’ s gp11/gp12. This conclusion theoretically supports the feasibility of our two tail fiber modification strategies, providing clear guidance for wet lab experiments and reducing trial-and-error costs, while also inspiring our subsequent software development ideas.

Discussion

In our initial attempts, we tried to predict the structure of the full-length tail fiber protein, but due to its long sequence and the presence of numerous hydrophilic, intrinsically disordered regions, the overall confidence of the resulting models was not ideal. To address this issue, we conducted an in-depth literature review to better understand the structural characteristics of the tail fiber protein and to define the functional scope and multimerization form of its N-terminal domain. Based on this, we shifted our focus to modeling this key structural domain, which successfully yielded a more reliable three-dimensional structure with higher confidence.

This exploratory process not only theoretically validated the feasibility of our two experimental plans, boosting the confidence of the wet lab team, but also inspired us to develop a computational tool named Alphage. This tool is designed to provide an automated solution for designing chimeric tail fiber proteins, helping to broaden the application scenarios of our project.

Infection and Inoculation

Introduction

In the field of agricultural bacterial disease control, Pseudomonas syringae serves as a typical pathogenic strain, often causing a reduction in crop yield or even total crop failure. Our project has designed a class of non-replicative phage-based pesticides, which target and kill pathogenic bacteria through a “toxin delivery” system, aiming to contribute to the treatment of tomato bacterial spot disease.

We have constructed a kinetic model of phage-like particle (PLP)-mediated bacterial infection to achieve an accurate prediction of the product’ s bactericidal efficacy, minimum application dose and therapeutic time window, providing a quantitative basis for the design of field experiments.

Model Assumptions

2.1 Non-replicability of PLP

The genes related to PLP replication have been removed in the project design, rendering PLP incapable of self-proliferation. Changes in PLP concentration are affected by two factors: consumption during the infection and killing of susceptible bacteria, and elimination by environmental factors (e.g., ultraviolet radiation).

2.2 Bacterial States

Pseudomonas syringae is classified into three states with transitions between states:

Susceptible bacteria (S): Not adsorbed by PLP, capable of normal growth and participating in resource competition, and can be eliminated by plant immunity.
Infected bacteria (I): Adsorbed by PLP but not yet dead, carrying toxins delivered by PLP, losing proliferation ability, and eventually dying due to the action of toxins.
Dead bacteria (D): Generated by the lysis of infected bacteria or the immune elimination of susceptible bacteria, without physiological activity, and not involved in resource competition or subsequent infection processes.

2.3 Growth and Resource Limitation

The growth of Pseudomonas syringae follows the logistic model and is limited by the carrying capacity of plant tissues (B_max). Only susceptible bacteria and infected bacteria compete for environmental resources.

2.4 Plant Immune Elimination

The plant immune system exerts an immune effect on both susceptible bacteria and infected bacteria, with the same elimination rate (δᵢ) for both. This simplifies the complex regulatory process of host immunity.

2.5 Infection Kinetics Assumption

The infection rate of PLP on susceptible bacteria is proportional to the concentration of PLP and the number of susceptible bacteria, conforming to the classical law of mass action for phage infection. The adsorption rate constant (β) is determined based on experimental data of homologous phages.

Model Formulas

$\frac{dS}{dt}=\alpha S\left(1-\frac{S+I}{B_{max}}\right)-\beta VS-\delta_i S$$$$\frac{dI}{dt}=\beta VS-\gamma I-\delta_i I$$$$\frac{dD}{dt}=\gamma I+\delta_i(S+I)$$$$\frac{dV}{dt}=-\beta VS-k_{UV}V$

3.1 Parameters

Parameters	Description	Unit	Value
α	Maximum specific growth rate of Pseudomonas syringae	h⁻¹	0.546
β	Phage Adsorption Rate Constant	mL·PFU⁻¹·h⁻¹	10⁻⁹
γ	Death rate of infected bacteria	h⁻¹	0.5
δᵢ	Clearance rate of bacteria by plant immune system	h⁻¹	0.3
k_UV	Inactivation rate of PLP by solar ultraviolet radiation	-	0.632
B_max	Carrying capacity of bacteria in plant tissue	CFU·g⁻¹	10⁸
S₀	Initial susceptible bacteria concentration	CFU·g⁻¹	10⁸
V₀	Initial dosage of phages	PFU·mL⁻¹	-
η	Target bactericidal rate	-	-
t	Time	h	-

3.2 Results

Fig. 1 dynamics

Figure 13. Dynamics of Pseudomonas syringae and PLP under Hypothetical Conditions.

The intersection point in the figure indicates the time point where the concentrations of PLP and susceptible bacteria are equal. At this point, the initial concentration of PLP is 10⁹ PFU·mL⁻¹.

Core Applications of the Model

Based on the above equations, the model can derive two key application indicators to provide quantitative guidance for practical dosage.

4.1 Prediction of Dosage

MBC (Minimum Bactericidal Concentration): Defined as “the minimum initial dosage of PLP (V₀) required to kill 99.9% of bacteria”. It is derived based on the simplification of bacterial clearance kinetics:

Simplified Assumption

When the concentration of PLP is sufficient, the dominant factor for bacterial clearance is PLP-mediated infection (ignoring the effects of bacterial growth and plant immunity), i.e.
$\frac{dB}{dt}=-\beta VB$ where B is the total bacterial load, given by B = S + I.

Solving by integration

Given the initial bacterial load as B₀ and the target bacterial load as B_target, the integral calculation yields:
$V_0\geq\frac{\ln(B_0/B_{target})-\delta_i}{\beta t}$

Curve Validation

Through fitting of the dose-concentration change curve, the calculated MBC is 5.0 × 10¹¹ PFU·mL⁻¹.

Fig. 2 dosage prediction

Figure 14. Dose-Response Curves

The left panel shows the relationship curve between bacterial clearance rate and initial PLP dosage concentration. The green vertical line represents the predicted minimum bactericidal concentration (MBC) value, and the red horizontal line represents the 99.9% bactericidal rate.
The right panel shows the relationship curve between final bacterial concentration and initial PLP dosage concentration. The red horizontal line denotes the threshold (10⁴ CFU·g⁻¹ or CFU·mL⁻¹, depending on detection units).

4.2 Prediction of Therapeutic Timing

By numerically solving the ODE (Ordinary Differential Equation) system, the model can quantify the changing trends of total viable bacterial concentration (S + I) and bactericidal rate over time. This enables prediction of two key time indicators: the time required for the total viable bacteria to decrease to the target threshold, and the onset time for the bactericidal rate to reach the target value. Combined with the initial dosage (adopting the minimum bactericidal concentration, MBC), the model further predicts the time (t) needed to achieve the target bactericidal rate (η). Here, η is defined as “the proportion of killed bacteria relative to the initial bacterial load”.

$\eta=\frac{B_0-B_t}{B_0}\times 100\%$$$$t=\frac{\ln(B_0/B_t)}{\beta V_0+\delta_i}$

Results from Calculations:

When the minimum bactericidal concentration (MBC) is applied, a bactericidal efficacy of 99.9% can be achieved at T_eff = 8.65 h, and the total bacterial load reaches the threshold concentration of 10⁴CFU/mL at T_total = 11.53 h.
The half-life of PLP is approximately 1.1 hours. Phage particles are rapidly inactivated under the influence of ultraviolet radiation, so attention should be paid to the timing of field application.

Fig. 3

Figure 15. PLP Bactericidal Kinetics and Action Time Curves (Initial Dosage V₀ = 5.0 × 10¹¹ PFU·mL⁻¹)

(a) Relationship between susceptible bacteria (S) concentration and time: The orange vertical line represents the effective time (T_eff = 8.65 h) corresponding to a 99.9% bactericidal rate. This subfigure reflects the targeted infection effect of PLP on susceptible bacteria.
(b) Relationship between total viable bacterial concentration (B = S + I) and time: The red horizontal line represents the target viable bacteria threshold (1.0 × 10⁴ CFU·g⁻¹). When the time reaches T_total = 11.53 h, the total viable bacterial concentration decreases to the target threshold, indicating that the bactericidal process achieves the expected control effect at this point.
(c) Relationship between bactericidal rate (η) and time: The orange horizontal line represents the target bactericidal rate (99.9%). The results show that when the time is T_eff = 8.65 h, the bactericidal rate reaches 99.9%, meeting the expected effect.
(d) Relationship between PLP (V) concentration and time: The curve shows that the PLP concentration decreases gradually with the progress of the infection process. Meanwhile, it is inactivated under the influence of ultraviolet radiation, leading to a rapid concentration decay.
(e) The predicted results of the model output in text form.

Discussion

5.1 Utility of the Model

Through this dosage prediction model, the minimum initial dosage required to achieve a certain bactericidal efficiency is calculated. This optimizes the application scheme and provides guidance for practical agricultural production.
By predicting the bactericidal timing under different application dosages, the model can assist scientific research in selecting appropriate sampling time points.
Based on the existing model prediction results, considering the combined use of phage-based biological pesticides and plant immunity activators can improve the efficiency of plant immune clearance, reduce pesticide dosage, and enhance the therapeutic effect.

5.2 Future Work

Refinement of environmental factors: Incorporate more actual environmental factors into the model to improve the prediction accuracy under field conditions.
Simulation of multiple applications: Analyze the relationship between application dosage and application interval based on the S-V curve. Under the scenario of multiple applications, further reduce the total pesticide dosage.

References

[1]Ishii K, Hashimoto-Gotoh T, Matsubara K. Random replication and random assortment model for plasmid incompatibility in Bacteria [J]. Plasmid, 1978, 1(4): 435–445.

[2]Uhlin B E, Nordström K. Plasmid incompatibility and control of replication: Copy mutants of the R-factor R1 in Escherichia coli K-12 [J]. Journal of Bacteriology, 1975, 124(2): 641–649.

[3]Levrier, A., Karpathakis, I., Nash, B. et al. PHEIGES: all-cell-free phage synthesis and selection from engineered genomes. Nat Commun 15, 2223 (2024). https://doi.org/10.1038/s41467-024-46585-1">https://doi.org/10.1038/s41467-024-46585-1

[4] Ando H, Lemire S, Pires DP, Lu TK. Engineering Modular Viral Scaffolds for Targeted Bacterial Population Editing. Cell Syst. 2015 Sep 23;1(3):187-196. doi: 10.1016/j.cels.2015.08.013. PMID: 26973885; PMCID: PMC4785837.

[5] Chen, W., Xiao, H., Wang, L., Wang, X., Tan, Z., Han, Z., Li, X., Yang, F., Liu, Z., Song, J., Liu, H., & Cheng, L. Structural changes in bacteriophage T7 upon receptor-induced genome ejection. Proceedings of the National Academy of Sciences of the United States of America. 2021, 118(37), e2102003118.

[6] Jones, G., Jindal, A., Ghani, U., Kotelnikov, S., Egbert, M., Hashemi, N., Vajda, S., Padhorny, D., & Kozakov, D. Elucidation of protein function using computational docking and hotspot analysis by ClusPro and FTMap. Acta Crystallographica Section D: Structural Biology. 2022, 78(6), 690-697.

[7] Desta, I.T., Porter, K.A., Xia, B., Kozakov, D., & Vajda, S. Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure. 2020, 28(9), 1071-1081.

[8] Vajda, S., Yueh, C., Beglov, D., Bohnuud, T., Mottarella, S.E., Xia, B., Hall, D.R., & Kozakov, D. New additions to the ClusPro server motivated by CAPRI. Proteins: Structure, Function, and Bioinformatics. 2017, 85(3), 435-444.

[9] Yueh, C., Hall, D.R., Xia, B., Padhorny, D., Kozakov, D., & Vajda, S. ClusPro-DC: Dimer Classification by the Cluspro Server for Protein–Protein Docking. Journal of Molecular Biology. 2017, 429(3), 372-381.

[10] Xue, C., Rodrigues, J.P., Kastritis, P.L., Bonvin, A.M., & Vangone, A. PRODIGY: a web server for predicting the binding affinity of protein–protein complexes. Bioinformatics. 2016, 32(23), 3676-3678.

[11] Vangone, A., & Bonvin, A.M. PRODIGY: A Contact-based Predictor of Binding Affinity in Protein–Protein Complexes. BioProtocol. 2017, 7(3), e2124.

[12] Delattre R, Seurat J, Haddad F, Nguyen TT, Gaborieau B, Kane R, Dufour N, Ricard JD, Guedj J, Debarbieux L. Combination of in vivo phage therapy data with in silico model highlights key parameters for pneumonia treatment efficacy. Cell Rep. 2022 May 17;39(7):110825. doi: 10.1016/j.celrep.2022.110825. PMID: 35584666.

[13] Young JM, Luketina RC, Marshall AM. The effects of temperature on growth in vitro of Pseudomonas syringae and Xanthomonas pruni. J Appl Bacteriol. 1977 Jun;42(3):345-54. doi: 10.1111/j.1365-2672.1977.tb00702.x. PMID: 885818.

[14] Stephen T. Abedon, Ph.D. Phage Adsorption Rate Constant. https://www.biologyaspoetry.com/terms/adsorption_rate_constant.html. 2025.7

[15] Bérces A, Fekete A, Gáspár S, et al. Biological UV dosimeters in the assessment of the biological hazard from environmental radiation. J Photochem Photobiol B. 1999 Nov-Dec;53(1-3):36-43. doi: 10.1016/s1011-1344(99)00123-2. PMID: 10672527.

[16] Mitchell R E. Coronatine production by some phytopathogenic pseudomonads. Physiological Plant Pathology. 1982;20(1):83-89. ISSN 0048-4059. https://doi.org/10.1016/0048-4059(82)90026-1.

[17] Zeng W, Brutus A, Kremer JM, Withers JC, Gao X, Jones AD, He SY. A genetic screen reveals Arabidopsis stomatal and/or apoplastic defenses against Pseudomonas syringae pv. tomato DC3000. PLoS Pathog. 2011 Oct;7(10):e1002291. doi: 10.1371/journal.ppat.1002291. Epub 2011 Oct 6. PMID: 21998587; PMCID: PMC3188540.