Model | YAU-China - iGEM 2025

Overview

Molecular Docking Model Diagram

The research project titled "Detection of Pseudomonas aeruginosa in Water Samples Using Engineered Bacteria Based on a Dual-Plasmid System and Enzyme Substrate Method" focuses on achieving "rapid" and "efficient" detection performance. This means the system needs to produce both yellow substances and blue fluorescent substances earlier, faster, and in larger quantities. Therefore, the performance of each biological component that constitutes the detection pathway determines the overall detection efficiency.

To enable the engineered bacterial detection system to promptly capture the specific signaling molecules (PQS and PYO) from Pseudomonas aeruginosa, and effectively activate and operate the downstream detection pathway, it is necessary to select high-quality genes. These genes should express proteins that can bind/complex with PQS/PYO, and their interaction with these signaling molecules should effectively contribute to the corresponding promoters upstream of the reporter protein genes. For this purpose, three candidate genes—brlR, soxR, and mexL—are available for selection.

Overview of Model Diagrams in ODE-based Research

Validation via molecular docking modeling showed that: .

Computational Biology and Molecular Docking

Abstract

This study integrates computational simulations with experimental validation to identify the optimal genetic components for a Pseudomonas aeruginosa biosensor. Molecular docking was employed to quantitatively evaluate the binding affinities of the transcription factors SoxR, BrlR, and MexL with the signaling molecule PYO. AlphaFold3 was used to predict the complex structures of these factors with their corresponding promoter DNAs. A multi-criteria decision analysis matrix was constructed to objectively assess performance. The MexL system achieved the highest comprehensive score and was recommended as the primary candidate. Wet-lab GUS enzyme activity assays confirmed this prediction, showing that the MexL system exhibited higher enzyme activity than the other groups. This research establishes a complete "computational prediction–quantitative evaluation–experimental validation" rational design paradigm, providing a systematic approach for synthetic biology component engineering.

1. Research Background

The core of synthetic biology lies in the predictable, engineering-based design and reconstruction of biological systems. This study focuses on a typical design challenge: prospectively screening the most sensitive components responsive to the Pseudomonas aeruginosa signature signal molecule PYO from multiple candidate transcription factor-promoter systems. Traditional trial-and-error methods suffer from inefficiency and lack of mechanistic insight, prompting the establishment of a "computationally guided rational design paradigm" that focuses on two decisive molecular events in the signaling pathway: 1. Ligand recognition: Determines sensor sensitivity to the target signal—the specific binding efficiency between the transcription factor and PYO. 2. Transcriptional activation: Determines sensor signal output strength—the ability of the ligand-bound transcription factor to form a complex with the promoter and drive downstream gene expression. By simulating these events at atomic and molecular levels and integrating multi-criteria decision analysis, this study achieves a qualitative-to-quantitative evaluation transition, providing high-confidence priorities for experimental validation.

2. Thermodynamic Quantitative Analysis of Ligand Binding Affinity

Specific binding between ligand and receptor is the foundation of biosensor signal perception, a process that strictly follows thermodynamic principles. This study used AutoDock4 to calculate the average binding free energy (ΔG) of candidate transcription factors with a dual-PYO molecular system through secondary docking. ΔG serves as a direct thermodynamic indicator of binding spontaneity and strength—more negative values indicate stronger binding capacity. Figure 1 shows the results of dual-PYO docking for the three transcription factors.

Figures 2 and 3 present the binding energy prediction convergence curves for the first and second 100 docking runs, respectively.

Figures 5–7 show animations of the molecular pocket search processes for SoxR, BrlR, and MexL.

3. Binding Free Energy Calculation and Dissociation Constant Estimation

Table 1 — Average binding free energy (ΔG) and ranking

RANK	Candidate Transcription Factor	ΔG(kcal/mol)
1	BrlR	-7.26
2	MexL	-7.1
3	SoxR	-5.79

Binding free energy (ΔG) and dissociation constant (Kd) are related through the thermodynamic formula: ΔG = −RT ln Kd Where: R = 1.987 × 10⁻³ kcal·mol⁻¹·K⁻¹ (ideal gas constant) T = 298.15 K (absolute temperature) Rearranging gives: Kd = exp(ΔG/RT) Table 2 — Calculated dissociation constants (Kd) in ascending order

RANK	Candidate Transcription Factor
1	BrlR	4.5
2	MexL	6.2
3	SoxR	55.2

4. Preliminary Thermodynamic Analysis and Conclusions

BrlR and MexL demonstrate significantly higher binding affinity for PYO than SoxR. Both exhibit Kd values in the low micromolar range, indicating efficient receptor occupancy and signal triggering at low environmental PYO concentrations, suggesting high detection sensitivity. SoxR's Kd value (55.2 μM) is nearly an order of magnitude higher, indicating weaker signal perception capability and prompting its preliminary exclusion based on the "sensitivity" criterion. Further structural analysis is needed to evaluate the comprehensive performance of each system.

5. Structural Biology Study of the Transcriptional Activation Complex

Ligand binding represents only the first step in signal transduction. The activated transcription factor must form a stable complex with the promoter DNA to drive downstream gene transcription. This study used AlphaFold3 to predict the complex structures of the three transcription factors with their corresponding promoters in the unbound PYO state, revealing conformational differences and their impact on transcriptional activation efficiency.

6. Preliminary Conformation–Function Relationship Analysis

The structural characteristics of each system suggest different functional mechanisms: SoxR's "open" conformation and numerous potential DNA interaction sites indicate strong DNA binding stability but potentially slower binding kinetics. BrlR's "closed" conformation suggests DNA binding strictly depends on PYO-induced conformational changes, with docking results showing no exposed DNA binding sites under normal conditions. MexL, despite having fewer DNA binding sites, appears to bind and dissociate from DNA more easily, suggesting faster DNA dynamic response capabilities. Based on this analysis, the BrlR–PBrlR system was excluded from further consideration, while the final selection between SoxR–PmexG and MexL–PphzA1 required a more comprehensive evaluation.

7. Construction of a Rational Screening Evaluation Matrix

Table 3 — Evaluation criteria and weight distribution

Decision criteria	Weight	Standard description
Ligand-binding affinity	30%	Quantitative assessment based on binding free energy
DNA binding conformation	30%	Basal transcription initiation efficiency
Prediction of interaction strength	20%	Assessment of complex stability
Conformational change necessity	20%	Activation energy barrier between inactive and active states

System	Ligand-receptor binding affinity (30%)	DNA binding conformation (30%)	Predict the strength of interaction (20%)	Conformational change necessity (20%)	Weighted total score
SoxR-PmexG
Goal	2	4	5	2
Weighted score	0.6	1.2	0.1	0.4	2.3
Basis for scoring	K_d=55.2μM insensitivity	Open clamp， Normally open conformation	Have the most interacting sites	May require minor conformation changes
BrlR-PBrlR
Goal	5	1	1	1
Weighted score	1.5	0.3	0.2	0.2	2.2
Basis for scoring	K_d=4.5μM the highest sensitivity	Closed conformation， ligand-unbound	No effective interaction	May require a large conformational change to open
MexL-PphzA1
Goal	4	3	2	1
Weighted score	1.2	0.9	0.4	0.2	2.7
Basis for scoring	K_d=6.2μM high sensitivity	Pre-binding region，Pre-activation	Partially well-defined binding mode	May require significant conformational changes

Recommendations: 1. MexL–PphzA1 system (2.70) — Strongly recommended This system leads in the comprehensive scoring, with its primary advantages reflected in two high-weight metrics: it achieves a high score in DNA-binding conformation (30% weight), where its "pre-activation" state exhibits an ideal propensity for DNA binding; and it earns a high score in ligand-binding affinity (30% weight). M sensitivity ensures effective capture of the PYO signal. Although its score in predicting interaction strength is relatively conservative, it exhibits the best overall performance balance. 2. SoxR–PmexG system (2.30) — Secondary option This system exhibits a pronounced performance imbalance: it achieves a perfect score in predicting interaction strength, with the open-clamp conformation providing an excellent foundation for DNA binding; however, it performs suboptimally in the critical metric of ligand-binding affinity. low sensitivity severely limits its signal detection capability, significantly constraining overall performance. 3. BrlR–PBrlR system (2.20) — Not recommended This system may suffer from severe functional deficiencies: while it demonstrates exceptional ligand-binding affinity, its performance is suboptimal across all other evaluation metrics. In particular, the closed conformation of the DNA-binding state constitutes a functional barrier, precluding efficient initiation of the transcription process. This fundamental limitation renders it unlikely to achieve the anticipated efficacy in practical applications.

8. Experimental Validation and Prediction Consistency Analysis

The MexL–PphzA1 system showed the highest enzyme activity (set as 100%), confirming its optimal performance. The SoxR group exhibited approximately 70% of the MexL group's activity, consistent with its moderate score. The BrlR group showed less than 50% of the MexL group's activity, confirming its functional defects.

Application of Curve Fitting Software Based on High-Dimensional Search and Azimuth Statistics

Summary

Dynamical models based on ordinary differential equations (ODEs) are widely used in physics, chemistry, and biology. By interpreting the intrinsic properties of a research object through a model and visualizing the output curves of state variables, we can better fit these curves to real experimental data (such as wet-lab measurement curves). Adjusting parameters according to the model's characteristics can improve fitness, The parameters of these models are usually estimated based on experimental data. [1] however, automated parameter search methods remain to be fully explored.

The majority of models involve transcendental functions and nonlinear systems, which means strategies to inversely deduce parameters from state variables are often impractical. Methods that avoid direct parameter inversion—such as neural network training and Bayesian search—rely heavily on large datasets for training, [2] which is time-consuming and not well-suited for quickly developing a convenient curve-fitting and prediction software.

Therefore, the modeling team at YAU-China has developed a reinforcement learning search method based on high-dimensional space concepts and curve similarity quantification and designed a fitting software accordingly. Testing on various curves demonstrates its broad applicability, suggesting potential influence on future research.

Software Overview

1. Software Name: CFS_BHDS_AS

Full Name: Curve Fitting Software Based on High-Dimensional Search and Azimuth Statistics

2. Developer: Shengxuan BIAN

Email: shixiu_yakuchi0324@qq.com

Download Link: https://gitlab.igem.org/2025/software-tools/yau-china/-/tree/main/CFS_BHDS_AS

Password: SoxR

3. Functions:

Fit experimental data curves and predict data curves, providing optimal parameters for the best fit

Provide a dynamic plotting window for real-time observation of fitting progress

Offer visualizations including predicted curves and parameter variation analysis

4. Software Dependencies: R-4.5.0，RStudio (2025.05.0+496)

5. Hardware Dependencies: Intel(R) Core(TM) i7-8850H CPU @ 2.60 GHz (2.59 GHz)

Methodology

We assume that for any ODE model, the parameters corresponding to the current predicted curve α form a prediction coordinate A, where each dimension represents the value of one parameter. Atom the perspective of the ODE model, the experimental data curve β has its own unique parameter combination; at least one target coordinate B contains these parameter values as its dimensions. The Euclidean distance AB between A and B represents the degree of dissimilarity between the predicted curve α and the experimental curve β.

The prediction coordinate A can be moved under the guidance of a spatial vector a⃗ to approach B. If the approach efficiency decreases, a spatial vector a⃗ that is perpendicular to the current trajectory direction and that further reduces the Euclidean distance AB is sought to guide the next movement of the prediction coordinate. Using the R function base::outer(x, y, "-") , we compute a distance matrix between points of the predicted curve α and experimental curve β after applying the same weighting, then calculate the mean squared error (MSE) as the dissimilarity metric.

As for the dynamic adjustment strategy of vector a⃗ , we maintain a reinforcement learning weight environment to record the historical performance of each parameter direction, based on which new movement vectors are probabilistically generated; when optimization efficiency declines, orthogonal constraints are applied to generate new vectors perpendicular to the current direction so as to sustain search momentum; meanwhile, a blacklist–whitelist mechanism is incorporated to avoid repeated and ineffective searches, and local fine-tuning is performed on directions that demonstrate superior performance.

This transforms the problem of "how to adjust parameters so the predicted curve increasingly resembles the experimental curve" into "how to reduce the distance between two points in high-dimensional space."

Research Task

We studied the project described at https://2025.igem.wiki/yau-china/: "An engineered bacterial system based on a dual-plasmid architecture and enzyme-substrate method for detecting Pseudomonas aeruginosa in aquatic environments."

The project requires visible color development between 6 and 12 hours after detection begins. We assumed a target curve shape for the total concentration of the reporter gene gus expression enzyme in the detection system, then used CFS_BHDS_AS to fit the curve, verify dimensional information, and obtain optimal prediction parameters. The parameters corresponding to the "ideal protein" were compared with authoritative database information for BrlR, SoxR, and MexL to identify the most suitable protein.

ODE Model Design

Research Process

1. Hypothesis and Fitting Exploration

1. Growth Kinetics of Pseudomonas aeruginosa and Engineered Bacteria

\[ \frac{dN_{\text{PA}}}{dt}= \mu_{\text{PA}}N_{\text{PA}}\!\left(1-\frac{N_{\text{PA}}}{K_{\text{PA}}}\right) \frac{N}{K_{N,\text{PA}}}-\delta_{\text{PA}}N_{\text{PA}} \] \[ \frac{dN_{\text{EC}}}{dt}= \mu_{\text{EC}}N_{\text{EC}}\!\left(1-\frac{N_{\text{EC}}}{K_{\text{EC}}}\right) \frac{N}{K_{N,\text{EC}}}-\delta_{\text{EC}}N_{\text{EC}} \]

Based on the logistic growth framework, intraspecific competition is introduced through the term (1 - N / K), where K_PA and K_EC represent the environmental carrying capacities of Pseudomonas aeruginosa (PA) and the engineered bacteria (EC), respectively, thereby simulating the reality that populations cannot grow indefinitely under resource-limited conditions; simultaneously, the term (N / K_N) directly links the growth rate to the nutrient concentration N (where K_N is the half-saturation constant for nutrient uptake, analogous to the Michaelis constant), ensuring that even if the population density has not reached its upper limit, growth will cease if nutrients are depleted (N→0); in terms of population dynamics, growth is represented by mu * … and natural decay or dilution by - delta * N, reflecting that population dynamics result from the net effect of the two processes of reproduction and death; regarding interspecific relationships, the growth equations for PA and EC have parallel forms but different parameters (such as mu, K, and delta), which not only accounts for inherent differences in the growth characteristics of the two bacterial species but also avoids explicitly modeling direct competition between them (e.g., through competition coefficients), instead implementing indirect competition via a shared nutrient pool N; in short, the core of this model is to simulate the process by which the two bacterial species each undergo logistic growth and continuous decay under conditions of limited nutrients.

2. Synthesis and Degradation of PQS and PYO

\[ \frac{dI_{\text{PQS}}}{dt}= s_{\text{PQS}}\frac{N_{\text{PA}}^{n}}{K_{\text{PA},\text{PQS}}+N_{\text{PA}}^{n}} -d_{\text{PQS}}I_{\text{PQS}} \] \[ \frac{dI_{\text{PYO}}}{dt}= s_{\text{PYO}}\frac{N_{\text{PA}}^{m}}{K_{\text{PA},\text{PYO}}+N_{\text{PA}}^{m}} -d_{\text{PYO}}I_{\text{PYO}} \]

At the level of signal molecule synthesis, Hill functions are employed to link the synthesis rate to the pathogen population density N_PA: when N_PA is low, the synthesis rate is slow, and when N_PA is high, the synthesis rate approaches the maximum values s_PQS or s_PYO, thereby simulating the core feature of quorum sensing whereby signal molecules are produced in large quantities only after the population reaches a specific threshold density; by setting Hill coefficients n and m both greater than 1, cooperative effects are introduced such that signal production does not exhibit a simple linear relationship with population density but instead displays a switch-like response, whereby signal output rises sharply once the critical density is exceeded. The term - d * I is adopted to assume that signal molecules degrade or are diluted at a constant rate d; meanwhile, given that PQS and PYO are distinct signaling pathways unique to Pseudomonas aeruginosa, they are modeled separately, laying the foundation for the subsequent construction of a dual-channel detection logic. In brief, the core of this component is that pathogen density controls the production of signal molecules through cooperative effects, while the signal molecules themselves undergo degradation over time.

3. Dual-Plasmid Replication Kinetics

\[ \frac{dP_{\text{MiniCTX}}}{dt}= r_{\text{MiniCTX}}P_{\text{MiniCTX}}\!\left(1-\frac{P_{\text{MiniCTX}}}{P_{\max,\text{MiniCTX}}}\right) \] \[ \frac{dP_{\text{pBlue}}}{dt}= r_{\text{pBlue}}P_{\text{pBlue}}\!\left(1-\frac{P_{\text{pBlue}}}{P_{\max,\text{pBlue}}}\right) \]

The model treats the number of plasmids as a dynamically changing variable rather than a fixed constant; in modeling the plasmid replication process, the key design is the adoption of a logistic growth model: intracellular resources such as dNTPs, replication enzymes, and ATP are limited, and plasmid replication consumes these resources; as the number of plasmids increases, resource competition intensifies, which in turn leads to a decline in the replication rate, where P_max represents the maximum number of copies of a particular plasmid that a cell can sustain while maintaining normal physiological activities, reflecting the homeostatic balance within the cell; for the dual-plasmid system, the model assigns independent replication parameters (r, P_max) to each of the two plasmids, a design that reflects their mutual non-interference and effectively enhances the modularity and predictability of the genetic circuit design. In summary: the operation of the genetic circuit depends on the quantity of gene templates (plasmids), and the quantity of templates itself is constrained by the internal resource conditions of the cell.

4. Regulatory Protein Expression

\[ \frac{dX_{\text{PqsR}}}{dt}= k_{\text{PqsR}}P_{\text{pBlue}}\!\left(1-\frac{X_{\text{PqsR}}}{X_{\text{sat},\text{PqsR}}}\right) -(\mu_{\text{EC}}+d_{\text{PqsR}})X_{\text{PqsR}} \] \[ \frac{dX_{\text{SoxR}}}{dt}= k_{\text{SoxR}}P_{\text{pBlue}}\!\left(1-\frac{X_{\text{SoxR}}}{X_{\text{sat},\text{SoxR}}}\right) -(\mu_{\text{EC}}+d_{\text{SoxR}})X_{\text{SoxR}} \]

The regulatory proteins PqsR and SoxR are expressed in a constitutive manner. The expression rate is defined by the term k_* * P_pBlue, which explicitly accounts for the gene dosage effect—i.e., a higher copy number of the pBlue plasmid results in a faster rate of regulatory protein production. For protein concentration control, the negative feedback term (1 − X / X_sat) prevents unlimited protein accumulation, stabilizing the concentration near the set value X_sat and thereby avoiding resource waste and potential toxicity. In the protein degradation mechanism, the loss term (mu_EC + d_X) again underscores the decisive influence of growth dilution on intracellular protein concentration; even with high expression intensity, rapid growth of the engineered strain may hinder protein accumulation due to “dilution.” In summary, by continuously producing and maintaining regulatory proteins at a certain level, the system achieves constant readiness for environmental signal detection, with the protein concentration jointly regulated by the plasmid template dosage and the physiological growth state of the host cell.

5. Reporter Gene Expression

\[ \text{Activation}_{P}= \frac{(X_{\text{PqsR}}\,I_{\text{PQS}})^{h}}{K_{\text{pqsA}}^{h}+(X_{\text{PqsR}}\,I_{\text{PQS}})^{h}} \] \[ \text{Activation}_{M}= \frac{(X_{\text{SoxR}}\,I_{\text{PYO}})^{g}}{K_{\text{mexG}}^{g}+(X_{\text{SoxR}}\,I_{\text{PYO}})^{g}} \]

In terms of logical implementation, the multiplication logic represented by (X_PqsR × I_PQS) ensures that a significant output is generated only when both the receptor protein X and the signaling molecule I are present simultaneously. If either is absent, the activation level is zero, thereby effectively guaranteeing detection specificity. This prevents both non-specific activation in the presence of signal but absence of receptor and leaky expression in the presence of receptor but absence of signal. Regarding response characteristics, the Hill function (^h / (K^h + ...^h)) transforms the continuous product value into a switch-like response. A high Hill coefficient (h, g > 1) ensures that the activation level remains near 0 (off state) when the complex concentration is below the threshold K, and rapidly jumps to 1 (on state) once the threshold is exceeded. In summary, through the combination of multiplicative operation and cooperative activation, the reporter gene is activated only when both inputs simultaneously exceed their respective thresholds.

\[ \frac{dX_{\text{lacZ}}}{dt}= k_{\text{lacZ}}\,\text{Activation}_{P}\,P_{\text{MiniCTX}}\!\left(1-\frac{X_{\text{lacZ}}}{X_{\text{sat},\text{lacZ}}}\right) -(\mu_{\text{EC}}+d_{\text{lacZ}})X_{\text{lacZ}} \] \[ \frac{dX_{\text{gus}}}{dt}= k_{\text{gus}}\,\text{Activation}_{M}\,P_{\text{MiniCTX}}\!\left(1-\frac{X_{\text{gus}}}{X_{\text{sat},\text{gus}}}\right) -(\mu_{\text{EC}}+d_{\text{gus}})X_{\text{gus}} \]

In the regulatory logic governing reporter protein expression, the expression rate is controlled by the activation level activation. In terms of template design, the reporter genes (X_lacZ, X_gus) are encoded on the Mini_CTX plasmid (P_Mini_CTX), whereas the detector proteins (X_PqsR, X_SoxR) are encoded on the pBlue plasmid (P_pBlue). This physical separation of templates is a key aspect of the modular design, avoiding the problem of excessive size for a single plasmid while enabling independent adjustment of copy numbers for different modules. In summary, the expression of the reporter gene is the output of the genetic circuit’s logic operations, modulated by the copy number of its respective plasmid, and ultimately produces a detectable signal (enzyme activity).

6. Substrate Metabolism Pathways

\[ \frac{d\text{oNpG}_{\text{int}}}{dt}= k_{\text{uptake},\text{oNpG}}\,\text{oNpG}_{\text{ext}}\!\left(1-\frac{\text{oNpG}_{\text{int}}}{\text{oNpG}_{\text{sat}}}\right) -k_{\text{cat},\text{lacZ}}X_{\text{lacZ}}\,\text{oNpG}_{\text{int}} \] \[ \frac{d\text{MuG}_{\text{int}}}{dt}= k_{\text{uptake},\text{MuG}}\,\text{MuG}_{\text{ext}}\!\left(1-\frac{\text{MuG}_{\text{int}}}{\text{MuG}_{\text{sat}}}\right) -k_{\text{cat},\text{gus}}X_{\text{gus}}\,\text{MuG}_{\text{int}} \]

used to capture saturation behavior—transport rate gradually decreases as the intracellular substrate concentration rises, thereby simulating both the efficiency limitations of the transporter protein and potential feedback inhibition mechanisms. In the intracellular substrate consumption step, the term k_cat_* * X_* * S_int represents the process by which the reporter enzyme converts the intracellular substrate into product. From the perspective of overall dynamic equilibrium, the equation describes the temporal evolution of the intracellular substrate concentration, whose final steady-state value is jointly determined by the rates of substrate uptake and consumption.

\[ \frac{dY_{\text{lacZ}}}{dt}= k_{\text{cat},\text{lacZ}}X_{\text{lacZ}}\,\text{oNpG}_{\text{int}} \] \[ \frac{dY_{\text{gus}}}{dt}= k_{\text{cat},\text{gus}}X_{\text{gus}}\,\text{MuG}_{\text{int}} \]

In terms of the relationship between product and substrate, the model is based on the explicit assumption that one molecule of substrate yields one molecule of colored product, thereby directly equating the accumulated amount of product Y with the intensity of the chromogenic signal. Regarding the driving factors of product formation, the production rate is proportional to the amount of functional reporter enzyme X_*, directly linking the output of the upstream genetic circuit (reporter protein concentration) to the final observable signal (product concentration). It is also proportional to the intracellular substrate concentration S_int, reflecting a first-order kinetic simplification that implicitly assumes S_int << K_m (the intracellular substrate concentration is much lower than the enzyme’s Michaelis constant). This assumption results in a linear dependence of the reaction rate on S_int, simplifying model analysis but potentially overestimating the reaction rate at high substrate concentrations. In summary, through the enzyme-catalyzed reaction, the output of the upstream genetic circuit (reporter enzyme concentration) is linearly translated into the final measurable signal (colored product concentration), with the overall pathway rate jointly limited by substrate uptake efficiency and enzyme catalytic efficiency.

7. Nutrient Kinetics

\[ \frac{dN}{dt}= -\Bigl[ \mu_{\text{PA}}N_{\text{PA}}+\mu_{\text{EC}}N_{\text{EC}} +\eta_{P}\!\bigl(\max\!\bigl(0,\tfrac{dP_{\text{MiniCTX}}}{dt}\bigr) +\max\!\bigl(0,\tfrac{dP_{\text{pBlue}}}{dt}\bigr)\bigr) +\eta_{X}\!\bigl(\max\!\bigl(0,\tfrac{dX_{\text{PqsR}}}{dt}\bigr) +\max\!\bigl(0,\tfrac{dX_{\text{lacZ}}}{dt}\bigr) +\max\!\bigl(0,\tfrac{dX_{\text{gus}}}{dt}\bigr) +\max\!\bigl(0,\tfrac{dX_{\text{SoxR}}}{dt}\bigr)\bigr) \Bigr] \]

Nutrient N is abstracted as a unified, consumable resource pool (e.g., carbon source, nitrogen source, ATP, etc.). Nutrient consumption is precisely partitioned among four major processes: biomass growth (–mu_PA * N_PA – mu_EC * N_EC) represents the most direct and dominant consume pathway, where faster population growth accelerates nutrient depletion; macromolecular synthesis (–η_P × dP – η_X × dX) is a key and sophisticated design explicitly modeling the operating costs of the genetic circuit, including nutrient consumption for plasmid replication (η_P × dP) and protein expression (η_X × dX). The pmax(0, ...) function ensures that nutrients are consumed only during net synthesis, with degradation not returning nutrients to the pool (a simplifying assumption). Through competition for the shared resource N, population growth, signal production, gene expression, and metabolic activity are mutually coupled. For instance, excessive genetic circuit activity can deplete nutrients, thereby limiting bacterial growth; conversely, rapid bacterial proliferation can exhaust nutrients and compromise circuit performance. This introduces a survival–function trade-off, enabling the simulation of more realistic biological behaviors. In summary, via a shared, consumable nutrient pool, all energy-consuming processes within the system are tightly interconnected, effectively mimicking the authentic intracellular strategies of resource competition and allocation.

8. Color Development Efficiency Sensitivity Curves

\[ \frac{d\text{Active}\_Y_{\text{lacZ}}}{dt}= \frac{k_{\text{cat},\text{lacZ}}X_{\text{lacZ}}\,\text{oNpG}_{\text{int}}}{(N_{\text{EC}}+10^{-10})(N_{\text{PA}}+10^{-10})} \] \[ \frac{d\text{Active}\_Y_{\text{gus}}}{dt}= \frac{k_{\text{cat},\text{gus}}X_{\text{gus}}\,\text{MuG}_{\text{int}}}{(N_{\text{EC}}+10^{-10})(N_{\text{PA}}+10^{-10})} \]

The numerator represents the production rate of product Y (i.e., the absolute signal intensity), while the denominator is the product of the quantities of the two bacterial populations. The resulting ratio denotes the “chromogenic rate contributed per unit of engineered bacteria and per unit of pathogen.” In terms of design intent, the primary objective is to evaluate system cooperativity—this metric quantifies the detection efficiency when both bacterial populations coexist, avoiding misjudgments where a high pathogen count (N_PA) yields a strong absolute signal but reflects low actual detection efficiency. The aim is to identify the ideal scenario in which the engineered bacteria can still achieve efficient detection even when the pathogen population is not large. To prevent computational failure caused by zero population sizes at the initial stage of the simulation, a numerical technique of (N + 1e-10) is adopted to avoid division-by-zero errors.

\[ \frac{d\text{Active}\_Y_{\text{LG}}}{dt}= \sqrt{\max\!\bigl(0,\tfrac{d\text{Active}\_Y_{\text{lacZ}}}{dt}\bigr) \cdot\max\!\bigl(0,\tfrac{d\text{Active}\_Y_{\text{gus}}}{dt}\bigr)} \]

A geometric mean is employed, such that the overall performance is high only when both channels achieve high individual performance; if either channel’s performance approaches zero, the overall performance will also tend toward zero. This design directly serves the objective of the “dual-plasmid, dual-channel” system, compelling the optimization algorithm to search for a parameter space that can simultaneously and efficiently activate both the lacZ and gus reporter systems. This, in turn, enhances the system’s reliability and anti-interference capability while avoiding potential false-positive issues that may occur in a single-channel configuration. The pmax(0, ...) function is used to ensure the value inside the square root remains non-negative; however, this “hard” treatment may mask non-physical negative values generated by the model under certain parameter combinations. In summary, an efficiency metric that emphasizes dual-channel cooperative operation is defined to guide system parameter optimization, ensuring the final engineered bacterial detection system is both highly efficient and reliable.

9. Parameter List

(Please refer to Appendix 1 for the parameter list.)

Based on general experience in molecular biology, for a clear visible observation of blue precipitation or color change in liquid culture media (with a short optical path length), the concentration of the Y_gus product typically needs to reach the order of 1–10 µM when visualized under an ultraviolet lamp. [3] We chose 5 µM as the threshold, simulated an ideal experimental curve, and used CFS_BHDS_AS for fitting and prediction.

PYO-SoxR pathway efficiency curve — Figure 1 - The black curve represents the target curve with extremely early and high saturation, while the colored curves represent the predicted curves. As the number of iteration rounds increases, the saturation level gradually increases and the saturation time advances slightly, which reflects the efficiency improvement of the PYO-SoxR-pMexG-gus-mug pathway in the ODE model.

2. Parameter Behavior and Fitting Analysis

Principal Component Analysis (PCA) can jointly synthesize multiple variables when they are all quantitative, so as to describe the set of individuals defined by these variables— which are the subject of descriptive research— in the best possible way.[4]For a tightly coupled ODE model, processing parameter variation information using principal component analysis (PCA) is an excellent choice.

Principal Component Analysis (PCA) of parameter changes during iterations revealed that all SoxR-related parameters correlated positively with PC1, indicating that increasing the "overall strength" of the SoxR pathway was necessary to achieve visible coloration.Variables close to the circle are well-represented; those near the center are poorly represented. Orthogonal radii and variables close to the circle are independent, while variables that are diametrically opposite on the circle show a negative correlation. [5]

Both SoxR expression rate k_SoxR and degradation rate d_SoxR increased, suggesting the protein needs both higher expression and faster turnover.

Discussion

Fitting results indicate that to achieve the desired "earlier rise and higher saturation" of the Y_gus curve, both expression and degradation parameters for the SoxR protein must increase.

Protein Name	Length	Molecular Weight
BrlR	270 aa	29 700 Da
SoxR	156 aa	17 160 Da
MexL	212 aa	23 320 Da

Table 1: The corresponding protein lengths were obtained from NCBI searches, and the molecular weights were estimated using an average molecular weight of amino acids ≈ 110 Da (including water loss from peptide bonds).

Among BrlR, SoxR, and MexL, SoxR has the lowest molecular weight (17,160 Da), suggesting it is easier to express and degrade. Therefore, based on dry-lab fitting results, SoxR is the preferred choice.

Three sensing schemes and their color-developing performance — Figure 4 – The working principles of the three sensors are shown in panels A–C; panel D displays their respective color-developing efficiencies.

To evaluate the real-world performance of the proteins BrlR, SoxR and MexL, the complete sensing modules—SoxR–P_mexG, MexL–P_phzA1 and BrlR–P_brlR—were introduced into engineered bacteria. Cells were challenged with identical concentrations of PYO, and the resulting fluorescence was quantified to assess sensitivity and reliability.

As shown in Figure 4D, the SoxR–P_mexG circuit delivered the strongest fluorescence signal together with the lowest background, exhibiting the best overall sensitivity and robustness. In the absence of PYO, the MexL–P_phzA1 circuit displayed a basal fluorescence that decreased upon PYO addition (dissolved in DMSO), a typical repression behaviour; however, the reduction was modest. The BrlR–P_brlR circuit produced the weakest reporter output, presumably owing to poor protein expression or low transcriptional activation efficiency.

ODE-based verification indicated that the regulatory protein SoxR possesses the smallest molecular weight among the candidates, endowing it with rapid expression and turnover—an attribute aligned with the requirement for high regulator recycling. After cross-checking against the NCBI database, these wet-lab results confirm that soxR is the optimal regulatory gene for experimental implementation. The superior fluorescence observed with the soxR-regulated pathway, relative to the mexL and brlR alternatives, provides strong empirical support for the model-guided experimental strategy.

References

[1] Dorešić D, Grein S, Hasenauer J. Efficient parameter estimation for ODE models of cellular processes using semi-quantitative data. Bioinformatics. 2024 Jun 28;40(Suppl 1):i558-i566. doi: 10.1093/bioinformatics/btae210. PMID: 38940161; PMCID: PMC11211815.

[2] Dastin-van Rijn EM, Widge AS. Failure modes and mitigations for Bayesian optimization of neuromodulation parameters. J Neural Eng. 2025 Jun 13;22(3):036038. doi: 10.1088/1741-2552/ade189. PMID: 40472872; PMCID: PMC12164532.

[3] Jain N, Kaur N. A Naked Eye and Fluorescence Turn-on Sensor for Smartphone Assisted Solid and Solution Phase Sensing of Highly Toxic Triphosgene. J Fluoresc. 2024 Mar;34(2):935-943. doi: 10.1007/s10895-023-03328-7. Epub 2023 Jul 11. PMID: 37432582.

[4] Ben Salem K, Ben Abdelaziz A. Principal Component Analysis (PCA). Tunis Med. 2021 Avril;99(4):383-389. English. PMID: 35244921; PMCID: PMC8734479.

[5] Ben Salem K, Ben Abdelaziz A. Principal Component Analysis (PCA). Tunis Med. 2021 Avril;99(4):383-389. English. PMID: 35244921; PMCID: PMC8734479.

State Variable Name	Function Description	Dimensional Info	Initial Value
P_Mini_CTX	Number of Mini-CTX plasmids	[P]	2
P_pBlue	Number of pBlue plasmids	[P]	1
X_PqsR	Concentration of PqsR regulatory protein	[C]	0.5
X_lacZ	Concentration of lacZ reporter protein	[C]	0.2
X_gus	Concentration of gus reporter protein	[C]	0.1
X_SoxR	Concentration of SoxR regulatory protein	[C]	0.1
I_PQS	Concentration of PQS signaling molecule	[C]	0.1
I_PYO	Concentration of PYO signaling molecule	[C]	0.05
N	Nutrient concentration	[N]	2.5
oNpG_int	Intracellular oNpG substrate concentration	[S]	0
MuG_int	Intracellular MuG substrate concentration	[S]	0
Y_lacZ	Product concentration of lacZ pathway	[S]	0
Y_gus	Product concentration of gus pathway	[S]	0
N_PA	Pseudomonas aeruginosa concentration	[X]	0.5
N_EC	Engineered bacteria concentration	[X]	0.3
dActive_Y_lacZ	Color development efficiency of lacZ path	[S][T]⁻¹[X]⁻²	0.1
dActive_Y_gus	Color development efficiency of gus path	[S][T]⁻¹[X]⁻²	0.1
dActive_Y_LG	Combined color development efficiency	[S][T]⁻¹[X]⁻²	0.1

Parameter Name	Function Description	Value	Units
mu_PA	Max growth rate of Pseudomonas aeruginosa	0.3	T⁻¹
K_PA	Carrying capacity of Pseudomonas aeruginosa	0.6	X
delta_PA	Death rate of Pseudomonas aeruginosa	0.005	T⁻¹
K_N_PA	Nutrient half-saturation constant for PA	0.1	N
mu_EC	Max growth rate of engineered bacteria	0.15	T⁻¹
K_EC	Carrying capacity of engineered bacteria	0.5	X
delta_EC	Death rate of engineered bacteria	0.02	T⁻¹
K_N_EC	Nutrient half-saturation constant for EC	0.1	N
s_PQS	Max synthesis rate of PQS	0.1	C·T⁻¹
K_PA_PQS	Half-saturation constant for PQS synthesis	0.3	X
n	Hill coefficient for PQS synthesis	2	—
d_PQS	Degradation rate of PQS	0.04	T⁻¹
s_PYO	Max synthesis rate of PYO	0.15	C·T⁻¹
K_PA_PYO	Half-saturation constant for PYO synthesis	0.4	X
m	Hill coefficient for PYO synthesis	2	—
d_PYO	Degradation rate of PYO	0.03	T⁻¹
r_Mini_CTX	Replication rate of Mini-CTX plasmid	0.08	T⁻¹
P_max_Mini_CTX	Max copy number of Mini-CTX plasmid	20	P
r_pBlue	Replication rate of pBlue plasmid	0.05	T⁻¹
P_max_pBlue	Max copy number of pBlue plasmid	10	P
k_PqsR	Expression rate constant of PqsR protein	0.1	P⁻¹·T⁻¹
X_sat_PqsR	Saturation concentration of PqsR protein	30	C
d_PqsR	Degradation rate of PqsR protein	0.05	T⁻¹
h	Hill coefficient for PqsR-PQS complex	3	—
K_pqsA	Activation constant of pqsA promoter	0.5	C
k_lacZ	Expression rate constant of lacZ	0.15	P⁻¹·T⁻¹
X_sat_lacZ	Saturation concentration of lacZ	60	C
d_lacZ	Degradation rate of lacZ	0.02	T⁻¹
k_SoxR	Expression rate constant of SoxR protein	0.1	P⁻¹·T⁻¹
X_sat_SoxR	Saturation concentration of SoxR protein	20	C
d_SoxR	Degradation rate of SoxR protein	0.05	T⁻¹
g	Hill coefficient for SoxR-PYO complex	2	—
K_mexG	Activation constant of mexG promoter	0.4	C
k_gus	Expression rate constant of gus	0.1	P⁻¹·T⁻¹
X_sat_gus	Saturation concentration of gus	50	C
d_gus	Degradation rate of gus	0.01	T⁻¹
k_uptake_oNpG	Uptake rate of oNpG substrate	0.15	T⁻¹
oNpG_ext	External oNpG substrate concentration	10	S
oNpG_sat	Saturation concentration of internal oNpG	10	S
k_cat_lacZ	Catalytic constant of lacZ enzyme	0.2	C⁻¹·T⁻¹
k_uptake_MuG	Uptake rate of MuG substrate	0.08	T⁻¹
MuG_ext	External MuG substrate concentration	10	S
MuG_sat	Saturation concentration of internal MuG	10	S
k_cat_gus	Catalytic constant of gus enzyme	0.05	C⁻¹·T⁻¹
eta_P	Nutrient cost coefficient for plasmid replication	0.02	—
eta_X	Nutrient cost coefficient for protein expression	0.005	—