Imagine a gene integration system where success rates can vary by nearly 3000-fold under different conditions. This isn't an experimental error—it’s a black box waiting to be opened.
In our GenOMe system, one-to-one means that one pair of attB and attP sites recombines once through Bxb1 integrase to form one integration event.
Traditional genetic engineering relies on trial and error, but we asked:
Our answer: Build an interpretable, predictive five-layer integration model.
| System Type | Traditional Mode A | Ready-to-Integrate Mode B | Efficiency Gain |
|---|---|---|---|
| One-to-one | ~0.57% | - | - |
| Two-to-two | ~0.05% | 79–87% | 1600-fold |
A single variable change produces a three-order-of-magnitude leap — this is the core mystery our model explains.
We decomposed the Bxb1 gene integration system into five modelable biological layers, peeling away complexity like an onion:



Quantifies how arabinose initiates Bxb1 gene transcription and the dynamics of mRNA production and degradation.
Calculates how long it takes for Bxb1 protein to reach effective concentration and when it begins to lose activity.
Models how ssAP forms attB sites on the chromosome — the system’s most fragile step.
Mechanism Breakdown: ssAP (single-strand annealing protein) must:
The multiplication of these four steps results in overall success rate < 1%.
Quantifies how Bxb1 recognizes and recombines two attB sites, and why longer fragments reduce efficiency.
Transforms parameters from the first four layers into a lookup-table success rate predictor for experiment design.
Most would assume low Bxb1 protein concentration causes poor efficiency. But the model reveals:
The answer lies in M3: randomness of attB site formation is the true killer.
Position Effect Formula: P(d) ≈ 0.6% × e−0.0007d, where d = distance from oriC (kb).
At the same chromosomal position (oriC-proximal), we compared:
Why the 10-fold difference? The model provides the answer:
The fundamental mechanism complexity differs, not just a probability multiplication:
Success_one-to-one = P₀ × [modification factors]
≈ 0.001066 × φtot × fgc × flocus × distance_factor × …
≈ 0.57%
Success_two-to-two = P₀_dual × [same modification factors]
≈ (P₀ / 15–20) × φtot × fgc × flocus × distance_factor × …
≈ 0.05%
Why is P₀_dual so much smaller? Two-to-two deletion requires:
Intuitive understanding: It's like needing to successfully thread two needles at once while they're moving—even if you're skilled at threading one needle (one-to-one), doing two simultaneously in coordination is fundamentally harder, not just twice as hard.
Mechanistic complexity multiplier: Two-to-two is 15-20 times harder than one-to-one
When using Ready-to-Integrate Mode B (linear DNA with pre-built attB sites):
| Fragment Length | Measured Success | Model Prediction | Error |
|---|---|---|---|
| 1039 bp | 87.5% | 87.6% | +0.1% |
| 1605 bp | 79.0% | 79.3% | +0.3% |
| 1613 bp | 80.0% | 79.1% | −0.9% |
Average prediction error < 0.4%
The model not only predicts accurately but also explains why:
We calibrated the model using three length anchor points (1039/1605/1613 bp):
Mean Absolute Error (MAE) = 0.37%
Maximum deviation < 1%
Covers practical range 1.0-1.6 kb
All predictions include 95% confidence intervals, with extrapolation regions (>4 kb, extreme GC%) clearly marked.
Choose different reading paths based on your needs:
→ Jump directly to M5 Experimental Design Assistant
Input your fragment parameters and get success rate predictions from the lookup table.
→ Focus on M3 vs M4 Comparative Analysis
See where the bottleneck truly lies.
→ Read sequentially M1 → M2 → M3 → M4 → M5
Trace the complete process from induction to integration.
→ Consult parameter tables and calibration guidelines for each layer.
All assumptions and limitations are explicitly stated.
Yes! This five-layer architecture applies to any gene editing system requiring:
Examples include:
The model reveals that "delivery bottleneck > reaction kinetics" — a principle explaining why:
The underlying logic is the same: bypass random intracellular synthesis/marking processes by directly delivering functional molecules.
This is not just a mathematical model, but a complete roadmap from gene expression to genome rewriting.
Choose your path and start exploring
Have questions about modeling? Want to see raw data? Click the sections to begin your journey!
The M1 model describes the mRNA dynamics of ssAP (single-strand annealing protein) and Bxb1 integrase under various induction regimes.
A key feature of M1 is its ability to resolve induction time windows, including the effect of cooling periods where transcription is reduced but not fully terminated.
mRNA dynamics are governed by:
$$ \frac{dm_g}{dt} = \beta_g \cdot I_g(t) - d_{m,g} m_g, \quad g \in \{ \text{Bxb1, ssAP} \} $$
where:
| Name | Definition | Value [unit] | Reference |
|---|---|---|---|
| βssAP | Transcription rate of ssAP | 0.15 [μM/min] | Estimated from promoter strength [1] |
| βBxb1 | Transcription rate of Bxb1 | 0.10 [μM/min] | Estimated from promoter strength [1] |
| βBxb1-J23105 | Transcription rate of Bxb1 under promoter J23105 | 0.04 [μM/min] | iGEM Registry [3] |
| t1/2,ssAP | mRNA half-life of ssAP | 7.0 [min] | Literature [2] |
| t1/2,Bxb1 | mRNA half-life of Bxb1 | 6.9 [min] | Literature [2] |
| t1/2,Bxb1-J23105 | mRNA half-life of Bxb1-J23105 | 6.8 [min] | Literature [2] |
| dm,ssAP | Degradation constant of ssAP mRNA | 0.099 [1/min] | Derived from half-life [2] |
| dm,Bxb1 | Degradation constant of Bxb1 mRNA | 0.100 [1/min] | Derived from half-life [2] |
| dm,Bxb1-J23105 | Degradation constant of Bxb1-J23105 mRNA | 0.102 [1/min] | Derived from half-life [2] |
| γ | Residual transcription ratio during cooling | 0.1 | Literature [5] |
| tinduce,ssAP | Induction time for ssAP | 45 [min] | Experimental condition |
| tinduce,Bxb1 | Induction time for Bxb1 | 240 [min] | Experimental condition |
| tinduce,J23105 | Induction time for J23105 promoter | 240 [min] | Experimental condition |
| tcool,ssAP | Cooling time for ssAP | 10 [min] | Experimental setting |
For ssAP, we explicitly incorporated a cooling phase to reflect reduced transcriptional activity:
$ I(t) = \begin{cases} 1, & 0 \le t < 45 \quad \text{(full induction)} \\ \gamma, & 45 \le t < 55 \quad \text{(cooling, residual activity)} \\ 0, & t \ge 55 \quad \text{(fully off)} \end{cases} $
with γ = 0.1, representing 10% residual transcriptional efficiency during cooling on ice [4].
Analytical solutions follow canonical forms:
These are standard analytical solutions [5] obtained by solving the first-order ODE for mRNA dynamics.
Extend ssAP induction to 45 min to align with its mRNA peak, accounting for residual transcription during cooling.
[1] Registry of Standard Biological Parts (iGEM). (n.d.). Promoter BBa_J23105. Retrieved from http://parts.igem.org/Part:BBa_J23105
[2] Bernstein, J. A., Khodursky, A. B., Lin, P. H., Lin-Chao, S., & Cohen, S. N. (2002). Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. PNAS, 99(15), 9697–9702. https://doi.org/10.1073/pnas.112318199
[3] Selinger, D. W., Saxena, R. M., Cheung, K. J., Church, G. M., & Rosenow, C. (2003). Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation. Genome Research, 13(2), 216–223. https://doi.org/10.1101/gr.912603
[4] Phadtare, S., & Inouye, M. (2003). Cold shock response and cold shock proteins. Genes & Development, 17(8), 971–981. https://doi.org/10.1101/gad.1077503
[5] Alon, U. (2006). An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC.
Building upon the transcriptional dynamics characterized in M1, the M2 model quantifies the accumulation of ssAP and Bxb1 proteins.
Thus, M2 serves as the critical bridge between gene expression and functional integration efficiency. Its role is to evaluate whether extended induction windows are sufficient to ensure proteins act within their effective functional windows.
We start from the canonical mRNA–protein dynamics widely used in gene expression modeling.
$$ \frac{dm_g}{dt} = \beta_g I_g(t) - d_{m,g} m_g(t), \quad g \in \{\mathrm{Bxb1}, \mathrm{ssAP}\}. \quad [1][2] $$
This is a first–order linear ODE describing production at rate $\beta_g$ (controlled by induction function $I_g(t)$) and degradation at rate $d_{m,g}$. The analytical solution is obtained using the integrating factor method:
$$ m_g(t) = m_g(0)e^{-d_{m,g}t} + \int_0^t \beta_g I_g(s)e^{-d_{m,g}(t-s)}\, ds. \quad [1][2] $$
Special cases include:
Moreover, the degradation constant is related to the mRNA half-life by $d_{m,g} = \ln 2 / t_{1/2}$, which in E. coli has been globally quantified to be on the order of a few minutes [5].
We incorporate an average translation/maturation delay $\tau$, such that fully functional proteins appear only after time $\tau$:
$$ \frac{dP_g}{dt} = k_{\mathrm{trans},g}\, m_g(t - \tau) - d^{\mathrm{eff}}_{p,g}\, P_g, \quad [3][4] $$
where the effective decay constant combines protein degradation and growth dilution:
$$ d^{\mathrm{eff}}_{p,g} \equiv k^{P}_{\mathrm{deg},g} + \mu. \quad [1][2] $$
The convention is $m_g(t - \tau) = 0$ for $t < \tau$.
Assume $I_g(t) = 1$ with initial conditions $m_g(0) = P_g(0) = 0$.
This expression shows that the protein dynamics are identical to the no-delay case, but shifted by $(t - \tau)$.
For switch-off or pulse induction with residual activity \(\gamma\), one substitutes the time-dependent \(I_g(t)\) into the mRNA solution and propagates it through the protein convolution. The closed form remains valid provided all terms are shifted by \((t-\tau)\) and \(d_p\) is replaced by dp,geff.
Protein removal is modeled through an effective decay constant \(d_{p,\mathrm{eff}}=d_p+\mu\), where \(\mu\) represents dilution due to cell growth and \(d_p\) denotes intrinsic protein degradation. In M2 we assume a constant \(\mu\), corresponding to balanced exponential growth, to keep the focus on transcription–translation–recombination dynamics. A burden-aware formulation \(\mu([\textit{Bxb1}])\) is introduced later as an optional extension.
| Name | Definition | Value [unit] | Reference |
|---|---|---|---|
| βssAP | Transcription rate of ssAP mRNA | 0.15 [μM/min] | Estimated from promoter strength [6] |
| βBxb1 | Transcription rate of Bxb1 mRNA (induced) | 0.10 [μM/min] | Estimated from promoter strength [6] |
| βBxb1-J23105 | Transcription rate of Bxb1 under J23105 promoter | 0.04 [μM/min] | iGEM Registry [6] |
| t1/2, mRNA, ssAP | mRNA half-life of ssAP | 7.0 [min] | Literature [7,8] |
| t1/2, mRNA, Bxb1 | mRNA half-life of Bxb1 | 6.9 [min] | Literature [7,8] |
| t1/2, mRNA, Bxb1-J23105 | mRNA half-life of Bxb1-J23105 | 6.8 [min] | Literature [7,8] |
| dm, ssAP | mRNA degradation constant of ssAP | 0.099 [1/min] | Calculated as ln2 / t1/2 [7,8] |
| dm, Bxb1 | mRNA degradation constant of Bxb1 | 0.100 [1/min] | Calculated as ln2 / t1/2 [7,8] |
| dm, Bxb1-J23105 | mRNA degradation constant of Bxb1-J23105 | 0.102 [1/min] | Calculated as ln2 / t1/2 [7,8] |
| ktrans, ssAP | Translation rate constant of ssAP | 0.50 [1/min] (baseline assumption) | Assumed unified value; later fitted [9,10] |
| ktrans, Bxb1 | Translation rate constant of Bxb1 | 0.20 [1/min] (baseline assumption) | Assumed unified value [9,10] |
| ktrans, Bxb1-J23105 | Translation rate constant of Bxb1 (J23105 promoter) | 0.20 [1/min] (baseline assumption) | Assumed unified value [9,10] |
| τ | Translation delay (initiation/maturation) | 2 [min] | Assumed; typical ribosome initiation + maturation delay [9] |
| t1/2, Protein, ssAP | Protein half-life of ssAP | 60 [min] | Literature: short-lived protein [11] |
| t1/2, Protein, Bxb1 | Protein half-life of Bxb1 | 960 [min] (16 h) | Literature: stable protein [11] |
| t1/2, Protein, Bxb1-J23105 | Protein half-life of Bxb1-J23105 | 960 [min] (16 h) | Literature: stable protein [11] |
| dp, ssAP | Protein degradation constant of ssAP | 0.012 [1/min] | Derived from half-life [11] |
| dp, Bxb1 | Protein degradation constant of Bxb1 | ≈ 0 [1/min] | Derived from half-life [11] |
| dp, Bxb1-J23105 | Protein degradation constant of Bxb1-J23105 | ≈ 0 [1/min] | Derived from half-life [11] |
| μ | Growth dilution rate | ~0.0116 [1/min] (Td=60 min) | Calculated as ln2 / Td |
| dpeff | Effective protein decay constant | $ d_p^{\text{eff}} = dp + \mu $ | Model definition |
In vitro assays typically employ serine integrases at 200–400 nM, occasionally up to 1000 nM.
Our simulations predicted μM-range intracellular concentrations, one to two orders of magnitude higher.
This confirms that absolute protein abundance is not the limiting factor in vivo.
The critical determinant is not whether proteins reach steady-state, but whether they are present at the right time window.
Role Differentiation
Dry → Wet (Prediction to Guidance):
Wet → Dry (Feedback to Refinement):
This cycle demonstrates how modeling can directly shape experimental protocols by moving beyond concentration magnitude and instead emphasizing the importance of temporal coordination and encounter probability.
Without this temporal adjustment, downstream recombination (M4) would systematically underestimate efficiency, despite protein abundance being non-limiting.
Conclusions:
Experimental Guidance:
[1]Alon, U. (2006). An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC.
[2]Thattai, M., & van Oudenaarden, A. (2001). Intrinsic noise in gene regulatory networks. PNAS, 98(15), 8614–8619.https://doi.org/10.1073/pnas.151588598
[3]Monk, N. A. M. (2003). Oscillatory expression of Hes1, p53, and NF-κB driven by transcriptional time delays. Current Biology, 13(16), 1409–1413.https://doi.org/10.1016/S0960-9822(03)00534-9
[4]Bratsun, D., Volfson, D., Tsimring, L. S., & Hasty, J. (2005). Delay-induced stochastic oscillations in gene regulation. PNAS, 102(41), 14593–14598.https://doi.org/10.1073/pnas.0503858102
[5]Bernstein, J. A., Khodursky, A. B., Lin, P. H., Lin-Chao, S., & Cohen, S. N. (2002). Global analysis of mRNA decay and abundance in Escherichia coli. PNAS, 99(15), 9697–9702.https://doi.org/10.1073/pnas.112318199
[6]Registry of Standard Biological Parts (iGEM). Promoter BBa_J23105.http://parts.igem.org/Part:BBa_J23105
[7]Selinger, D. W., et al. (2003). Global RNA half-life analysis in Escherichia coli. Genome Research, 13(2), 216–223.https://doi.org/10.1101/gr.912603
[8]Li, G.-W., Oh, E., & Weissman, J. S. (2012). The anti-Shine-Dalgarno sequence drives translational pausing. Nature, 484(7395), 538–541.https://doi.org/10.1038/nature10965
[9]BioNumbers ID 100059. Rate of translation by ribosome in E. coli.https://bionumbers.hms.harvard.edu/bionumber.aspx?id=100059
[10]Belle, A., et al. (2006). Quantification of protein half-lives in budding yeast. PNAS, 103(35), 13004–13009.https://doi.org/10.1073/pnas.0605420103
The M3 model adopts a single-cell, site-discrete probabilistic framework to describe how synthetic oligonucleotides (oligos) are integrated into the E. coli MG1655 chromosome via RecT (ssAP), forming attB sites. Compared to M1 (mRNA dynamics) and M2 (protein dynamics), addressing not only whether a cell forms at least one attB site, but also how genomic position relative to oriC affects integration probability.
Traditional concentration-based kinetic models fail to capture the polarization phenomenon often observed in recombineering: in the same batch, a minority of cells succeed in integration while most fail completely. This arises because each cell contains only 1–4 genome copies, making the relevant question not the average concentration of intermediates but the probability that a given cell achieves ≥1 attB integration event. This motivated a stochastic, site-limited modeling strategy.
(1) Concentration-to-molecule conversion
The ssAP concentration output from M2 is converted into per-cell molecule numbers:
$$ N_{ssAP}(t) = ssAP \times 602 $$
where 602 is the approximate conversion factor for E. coli (1 µM ≈ 602 molecules per 1 fL cell volume) [1]. This links the protein-level dynamics in M2 to discrete single-cell stochastic modeling in M3.
(2) Single-cell reaction rate function (Cycle 4 baseline)
For cell i, with genome copy number (Gi {1,2,4}) (replication-state dependent [2]), the instantaneous rate of attB formation is modeled by a Michaelis–Menten form:
$$ v_i(t) = G_i \cdot \frac{k_{cat} , N_O^{(i)}(t)}{K_m + N_O^{(i)}(t)} $$
where:
Oligos decay exponentially due to nucleases [4,5]:
$$ N_O^{(i)}(t) = N_O^{(i)}(0) e^{-k_{loss}t} $$
(3) Stochastic integration as a non-homogeneous Poisson process
The cumulative intensity is:
$$ \Lambda_i(t) = \int_0^t v_i(\tau) d\tau $$
and the probability that cell i has formed at least one attB site by time t is:
$$ P_i(t) = 1 - e^{-\Lambda_i(t)}. $$
Monte Carlo sampling (10³–10⁴ cells) over initial oligo distributions (log-normal, GSD ≈ 2–3 [6]), ssAP multiplicative noise (GSD ≈ 1.3–1.5 [7]), and genome copy number distributions yields the population-level fraction of cells with ≥1 attB site.
(4) Cycle 5 extension: locus dependence and replication-fork gating
Cycle 4 assumed all chromosomal loci were equally accessible. In Cycle 5, we introduce explicit locus and time-dependence:
$$ v_i(x,t) = G_i(x,t) \cdot \frac{k_{cat} N_O^{(i)}(t)}{K_m + N_O^{(i)}(t)} \cdot \chi_{\text{rep}}(x,t) $$
where:
Thus, attB formation can only occur during fork passage, and success probability decreases with locus distance from oriC.
(5) Empirical oriC–ter decay function
Fitting experimental and predicted efficiencies (hisA, metA, galK, leuD, lacZ), the success probability declines approximately exponentially with oriC distance:
$$ P(d) \approx P_0 \cdot e^{-k d} $$
where:
This fit indicates that each additional ~1000 kb from oriC reduces integration probability by ~50–70%, consistent with oriC-proximal loci (hisA, 0.57%) being an order of magnitude more efficient than ter-proximal loci (leuD, 0.05%).
(6) Numerical implementation
Analytical solutions remain tractable only in the simplified Cycle 4 case (constant inputs, no locus dependence). With locus-specific copy number and replication timing, numerical integration combined with Monte Carlo sampling is required. This allows computation of population-level trajectories, polarization histograms, and locus-dependent gradients (Figs. M3-2A–E).
| Name | Definition | Value [unit] | Reference |
|---|---|---|---|
| Vcell | Average E. coli cell volume | 1 fL | Alon (2006) [1] |
| x | Genomic locus coordinate (distance from oriC) | 0–2300 kb (E. coli genome half-length) | Cooper & Helmstetter (1968) [2] |
| Gi(x,t) | Locus-specific copy number (replication-timing dependent) | 1–4 copies; oriC-proximal loci show earlier/multiple copies | Cooper & Helmstetter (1968) [2] |
| χrep(x,t) | Replication-window indicator function (fork accessibility) | Binary: 1 if replication fork passes locus x at time t; 0 otherwise | Cooper & Helmstetter (1968) [2]; Rao et al. (2018) [9] |
| trep(x) | Replication timing of locus x (fork arrival time) | ~0–40 min depending on oriC–ter position (E. coli doubling time 40 min) | Cooper & Helmstetter (1968) [2] |
| kcat | Catalytic turnover rate of ssAP | 0.01–0.2 (baseline 0.02) min⁻¹ | Briggs & Haldane (1925) [3] |
| Km | Effective Michaelis constant | 10³–10⁴ (baseline 6000) molecules | Briggs & Haldane (1925) [3] |
| gm, oligo | Geometric mean of oligo uptake per cell | 10²–10⁴ (baseline 300) molecules | Mosberg et al. (2010) [6] |
| gsd, oligo | Geometric SD of oligo uptake distribution | 2–3 | Taniguchi et al. (2010) [7] |
| kloss | Oligo degradation rate constant | 0.15–0.20 (baseline 0.2) min⁻¹ | Sawitzke et al. (2007) [5]; Mosberg et al. (2010) [6] |
| NssAP(t) | ssAP molecules per cell over time | Derived from M2 dynamics (~10²–10³) | Taniguchi et al. (2010) [7] |
| p≥1(t) | Population fraction of cells with ≥1 attB | Simulation output | – |
| t₁₀, t₅₀, t₉₀ | Characteristic times when p≥1(t) = 0.1/0.5/0.9 | Simulation output | – |
(1) Baseline molecular framework .
Simulations under baseline parameters (gm = 300 molecules/cell, kcat = 0.02 min⁻¹, Km = 6000 molecules) reproduced the characteristic recombineering behavior. Population success curves exhibited lag–rise–plateau dynamics with a final plateau of ~30–35%, consistent with reported λ-Red/RecT efficiencies. Polarization histograms revealed bimodal outcomes: most cells remained unsuccessful, while a small fraction achieved near-complete integration. Sensitivity analyses confirmed parameter priority as oligo delivery (gm) > catalytic turnover (kcat) > Km, with delivery dominating across regimes.
(2) Locus dependence .
In incorporating locus position and replication-fork gating fundamentally reshaped predictions. Ori-proximal loci (hisA) reached high success probabilities (0.57%), mid-replichore loci (galK) were intermediate (~0.19%), and ter-proximal loci (leuD) dropped to 0.05%. A simple exponential model captured this oriC–ter gradient:
$$ P(d) \approx 0.6\%\,\cdot\, e^{-0.0007d}, $$
indicating each additional ~1000 kb from oriC reduces success probability by ~50–70%. These results demonstrate that positional constraints outweigh steric hindrance: two-to-two recombination at lacZ (0.0195%) did not outperform one-to-one at leuD (0.0386%).
To explicitly incorporate replication timing, Cycle 5 simulations compared representative loci across the E. coli chromosome. As shown in Fig. M3-2E, ori-proximal genes (hisA) reached significantly higher success fractions than mid- and ter-proximal sites (galK, leuD), consistent with experimental observations. This establishes a genome-wide oriC–ter gradient as the dominant factor shaping integration efficiency.
Simulated integration success trajectories for representative chromosomal loci demonstrate a clear oriC–terC gradient. Ori-proximal loci (hisA) reach high plateaus (0.57%), mid-replichore locus (galK) reaches intermediate levels (~0.19%), and ter-proximal loci (leuD) remain very low (0.05%). The curves highlight that genomic position, mediated by replication timing and locus copy number, is the dominant determinant of success, outweighing steric hindrance effects.
Dry → Wet (predictions):
Wet → Dry (feedback):
[1] Alon, U. (2006). An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC.
[2] Cooper, S., & Helmstetter, C. E. (1968). Chromosome replication and the division cycle of Escherichia coli B/r. Journal of Molecular Biology, 31(3), 519–540. https://doi.org/10.1016/0022-2836(68)90425-7
[3] Briggs, G. E., & Haldane, J. B. S. (1925). A note on the kinetics of enzyme action. Biochemical Journal, 19(2), 338–339. https://doi.org/10.1042/bj0190338
[4] Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25), 2340–2361. https://doi.org/10.1021/j100540a008
[5] Sawitzke, J. A., Thomason, L. C., Costantino, N., Bubunenko, M., Datta, S., & Court, D. L. (2007). Recombineering: In vivo genetic engineering in E. coli, S. enterica, and beyond. Methods in Enzymology, 421, 171–199. https://doi.org/10.1016/S0076-6879(06)21012-3
[6] Mosberg, J. A., Lajoie, M. J., & Church, G. M. (2010). Improving oligonucleotide recombineering by removing roadblocks to annealing. Nucleic Acids Research, 38(20), 7462–7469. https://doi.org/10.1093/nar/gkq617
[7] Taniguchi, Y., Choi, P. J., Li, G. W., Chen, H., Babu, M., Hearn, J., Emili, A., & Xie, X. S. (2010). Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991), 533–538. https://doi.org/10.1126/science.1188308
[8] Cooper, S., & Helmstetter, C. E. (1968). Chromosome replication and the division cycle of Escherichia coli B/r. Journal of Molecular Biology, 31(3), 519–540. https://doi.org/10.1016/0022-2836(68)90425-7
[9] Mosberg, J. A., Gregg, C. J., Lajoie, M. J., Wang, H. H., & Church, G. M. (2012). Improving oligonucleotide-mediated recombineering by mismatch repair protein mutS. Nucleic Acids Research, 40(14), e111. https://doi.org/10.1093/nar/gks412
The M4 model operates within a hierarchical framework where upstream models (M1–M3) provide boundary conditions: M1–M2 establish the temporal profile of Bxb1 concentration following induction, capturing the rise, plateau, and potential decay phases [1][2]; M3 quantifies chromosomal attB site availability and the position-dependent efficiency gradient along the oriC–terC axis [3][4]; M4 constitutes the mechanistic core, converting these upstream conditions into dynamic trajectories of success versus failure pathways through explicit treatment of binding, synapsis, and cleavage steps [5][6].
This modular architecture enables independent validation and parametric refinement of each layer,maintaining model interpretability and experimental tractability [7][8].The design supports two-level population mapping: from conditional success rates within the attB-bearing sub-population to whole-population outcomes.Specifically, we define:PTX = nattB / ntotal (transformation efficiency, e.g., 1700 / 100,000 = 0.017)Example calculation: 1.147 % × PTX = 0.017 = 0.0195 % overall, confirming consistency between conditional and population-level rates.We propagate binomial SE for PTX in confidence intervals of whole-population rates;central value 0.017 with propagated uncertainty (see Methods).We calibrate parameters using the simpler one-to-one insertion system before extending to two-to-two replacement,avoiding identifiability problems inherent in direct fitting of complex mechanisms [7][8].In our GenOMe system, one-to-one means that one pair of attB and attP sites recombines once through Bxb1 integrase to form one integration event.
Two-to-two, on the other hand, means two pairs of attB and attP sites are available on the genome and the DNA fragment, allowing two recombination events to happen at the same time — one at each site.
The one-to-one insertion mechanism proceeds through sequential steps: high-affinity recognition (kon / koff) → single-end homologous synapsis (kf1 / ksynr) → catalytic strand exchange (kcat) → product release with passive attL/attR binding [5][9]. This simpler topology establishes the quantitative relationship between sequence specificity (encoded in site-dependent kon / koff ratios) and overall conversion efficiency, providing a validated parameter window for the two-to-two model [10].
Critical correction on product binding: attL and attR sites do not catalytically inhibit the forward reaction. Instead, they function as non-reactive binding sinks that transiently sequester Bxb1 through reversible association, reducing the free enzyme fraction available for productive attP–attB recognition. This passive sequestration becomes significant only at high product accumulation and does not drive reverse conversion in the absence of recombination directionality factor (RDF) [11][12].
$$ \begin{aligned} \text{attP} + \text{Bxb1} &\xrightleftharpoons[k_{-1}]{k_1} C_P \\ \text{attB} + \text{Bxb1} &\xrightleftharpoons[k_{-2}]{k_2} C_B \\ C_P + C_B &\xrightleftharpoons[k_{-s}]{k_s} \text{synapsis} \xrightarrow{k_{cat}} C_L + C_B \\ C_L &\xrightleftharpoons[k_{-3}]{k_3} \text{attL} + \text{Bxb1} \\ C_R &\xrightleftharpoons[k_{-4}]{k_4} \text{attR} + \text{Bxb1} \end{aligned} $$
The two-to-two replacement mechanism embodies “dual-end synchrony”: two spatially separated attP/attB pairs first form independent single-end synaptic complexes (Sleft and Sright) with Bxb1 dimers; these must then encounter each other through chromosomal looping to achieve bimolecular ring-closure (rate constant kr2), assembling a tetrameric synaptic complex (D); finally, concerted cleavage and strand exchange complete the replacement [6][13].
Unlike one-to-one insertion—which opens a chromosomal gap and introduces new DNA—two-to-two performs direct segment replacement between pre-installed attB sites without altering total genome length. Critically, the newly inserted segment can carry orthogonal att sites at its termini, enabling unlimited replacement cycles without cumulative length increase. This architecture enables unlimited “replace-without-growth” cycles and shifts selection from additive stacking to iterative updating.
$$ \begin{aligned} \frac{d[\text{attP}]}{dt} &= -k_{\text{on}}[\text{attP}][\text{Bxb1}] + k_{\text{off}}[C_p] \\ \frac{d[\text{attB}]}{dt} &= -k_{\text{on}}[\text{attB}][\text{Bxb1}] + k_{\text{off}}[C_b] \\ \frac{d[\text{Bxb1}]}{dt} &= -k_{\text{on}}[\text{attP}][\text{Bxb1}] - k_{\text{on}}[\text{attB}][\text{Bxb1}] + k_{\text{off}}[C_p] + k_{\text{off}}[C_b] + 2k_{\text{cat}}[C_{\text{syn}}] \\ \frac{d[C_p]}{dt} &= k_{\text{on}}[\text{attP}][\text{Bxb1}] - k_{\text{off}}[C_p] - k_{\text{syn}}[C_p][C_b] + k_{-\text{syn}}[C_{\text{syn}}] \\ \frac{d[C_b]}{dt} &= k_{\text{on}}[\text{attB}][\text{Bxb1}] - k_{\text{off}}[C_b] - k_{\text{syn}}[C_p][C_b] + k_{-\text{syn}}[C_{\text{syn}}] \\ \frac{d[C_{\text{syn}}]}{dt} &= k_{\text{syn}}[C_p][C_b] - k_{-\text{syn}}[C_{\text{syn}}] - k_{\text{cat}}[C_{\text{syn}}] \\ \frac{d[\text{attL}]}{dt} &= k_{\text{cat}}[C_{\text{syn}}] - k_{\text{on}}[\text{attL}][\text{Bxb1}] + k_{\text{off}}[C_l] - k_{\text{on}}[\text{attL}][\text{Bxb1}] + k_{\text{off}}[C_r] \\ \frac{d[C_l]}{dt} &= k_{\text{on}}[\text{attL}][\text{Bxb1}] - k_{\text{off}}[C_l] \\ \frac{d[C_r]}{dt} &= k_{\text{on}}[\text{attL}][\text{Bxb1}] - k_{\text{off}}[C_r] \end{aligned} $$
| Parameter | Definition | Magnitude | Mechanistic Basis |
|---|---|---|---|
| kf1 | Single-end synapsis rate | kf1 ≫ kf2 | Rapid, reversible encounter between pre-bound sites |
| kf2 | Dual-end ring-closure rate | Rate-limiting | Geometrically and topologically constrained looping |
Notation (first-use definitions). kf1 (single-end synapsis), ksynr−1 (its reverse), kf2 (dual-end ring-closure), krel (single-end disassembly), kcat (cleavage). Sleft/right: single-end synaptic intermediates; D: dual-end tetrameric complex. We normalize [S] ∈ [0,1] as the fraction of cells (or occupancy) carrying a single-end synaptic intermediate. fssAP (0.05–0.10): fraction of chromosomal attB sites activated by RecT/CspRecT-mediated single-strand annealing. asite (0.28–0.40): position-dependent reduction in association rate due to nucleoid structure. gloop (0.30–0.40): reduction in ring-closure rate due to DNA topology and confinement.
Both published data and our one-to-one analyses demonstrate a robust positional effect: integration efficiency decreases monotonically with distance from oriC toward terC [3][14]. Quantitative examples include high-efficiency loci (hisA, metA, galK; all oriC-proximal) versus low-efficiency regions (leuD, lacZ; ter-proximal) [15][16].
Our experimental two-to-two design utilized the lacZ region primarily for blue-white screening convenience, accepting the trade-off of operating in a low-efficiency zone where lacZ and leuD nearly coincide.
The 30–60 min / 0.3–1.0 μM window corresponds to the time-over-threshold (ToT) concept originating from M1–M2 models and is localized here as the standard operational (SOP) range for Mode B optimization.
This locus choice explains why the efficiency differential between our two-to-two and one-to-one implementations was smaller than initial predictions: both operated under comparable geometric constraints. Relocating one or both att sites toward oriC would substantially increase overall success probability by enhancing both local DNA accessibility and effective encounter rates [4].
$$ \frac{d[C_P]}{dt} = k_{on}[\text{attP}][\text{Bxb1}] - k_{off}[C_P] $$
$$ \frac{d[C_B]}{dt} = k_{on}[\text{attB}][\text{Bxb1}] - k_{off}[C_B] $$
These expressions inherit sequence-dependent affinity from heat-map data [10]. We map in vitro specificity ratios to in-cell \(k_{on}/k_{off}\) via proportional scaling while preserving \(K_d\); absolute values are re-fit in Layer 2. Paired adjustments maintain thermodynamically reasonable \(K_d\) values (typically 10–50 nM). High-affinity recognition ensures rapid, reversible pre-synaptic complex formation, establishing the substrate pool for downstream synapsis and cleavage. Sequence-dependent differences manifest primarily at this recognition step, avoiding mis-attribution to catalytic phases.
$$ \frac{d[S]}{dt} = k_{f1}[C_P][C_B] - k_{synr}[S] - k_{cat}[S] $$
Synaptic complex stability governs the waiting-time distribution before potential dual-end encounters in the two-to-two system. Large \(k{synr}\) values induce rapid cycling between pre-synaptic and synaptic states, reducing the steady-state synaptic intermediate pool. This step exhibits strong concentration dependence and sensitivity to the Bxb1:DNA stoichiometric ratio.
In the absence of RDF, we do not include reverse catalysis. Instead, attL/attR contribute to enzyme sequestration:
$$ \frac{d[C_L]}{dt} = k_{on}^{attL}[\text{attL}][\text{Bxb1}] - k_{off}^{attL}[C_L] $$
$$ \frac{d[C_R]}{dt} = k_{on}^{attR}[\text{attR}][\text{Bxb1}] - k_{off}^{attR}[C_R] $$
These enter the Bxb1 mass balance but leave product concentrations unchanged; we do not subtract binding terms from d[attL]/dt or d[attR]/dt. Products are not consumed by reversible binding; the effect is reduced free enzyme, not catalytic inhibition [11][12].
These terms subtract only from the free [Bxb1] pool and do not enter d[attL]/dt or d[attR]/dt, thereby preserving mass balance without introducing reverse catalysis.
Two single-end intermediates \( S_{left} \) and \( S_{right} \) must meet to form \( D \).
Defining \( J_{f2} = k_{f2} [S_{left}] [S_{right}] \):
$$ \frac{d[S_{left}]}{dt} = \dots - J_{f2} - k_{rel}[S_{left}] $$ $$ \frac{d[S_{right}]}{dt} = \dots - J_{f2} - k_{rel}[S_{right}] $$ $$ \frac{d[D]}{dt} = + J_{f2} - k_{cat}[D] $$
Under \( [S_{left}] = [S_{right}] = [S] \), dual-end encounter is second-order in [S], yielding:
$$ P_{ring} = \frac{k_{f2}[S]^2}{k_{f2}[S]^2 + k_{rel}} $$
Here [S] is a normalized single-end pool (fraction of cells or dimensionless occupancy), so \( k_{f2}[S]^2 \) and \( k_{rel} \) are effective first-order terms and \( P_{ring} \) is dimensionless. This square dependence explains the sharp, sigmoidal gain when ends are closer; \( k_{rel} \) captures the decay of single-end intermediates outside the synchrony window.
gp47 (RDF) was not expressed in any condition; models therefore set krev = 0 and treat the forward pathway as effectively irreversible after dual-end closure. Any effects of attL/attR are modeled only as passive, reversible enzyme-sequestration terms in the Bxb1 mass balance (no catalytic inhibition) [11][12]. As RDF is excluded by design, we do not analyze reversibility further in this work.
We avoid double counting geometry by applying it only to the steps it governs:
$$ I_{\text{eff}} = I_{\text{tot}} \times f_{\text{ssAP}} $$ $$ k_{\text{on,eff}} = a_{\text{site}} \cdot k_{\text{on}} \quad \text{(association only)} $$ $$ k_{f2,\text{eff}} = g_{\text{loop}} \cdot k_{f2} \quad \text{(ring-closure only)} $$
We map chromosomal position and locus accessibility to \( a_{\text{site}} \) (association only), and polymer looping/topology (length, orientation, torsion) to \( g_{\text{loop}} \) (ring-closure only); no geometry factors are applied elsewhere [3][4][17][18].
| Factor | Physical Meaning | Range | Data Source |
|---|---|---|---|
| fssAP | ssAP-mediated attB activation efficiency | 0.05–0.10 | M2/M3 output |
| asite | Locus accessibility (position-dependent) | 0.28–0.40 | [3,4] |
| gloop | Looping geometry factor | 0.30–0.40 | [17,18] |
| Ieff | Effective Bxb1 concentration | 1–2 μM | Itot × fssAP |
Ranges reflect our fit priors anchored to macrodomain/topology literature; see Table S2 for locus-wise posteriors. Reported Ieff (1–2 μM) reflects M2/M3-inferred fssAP under our induction regime and should be re-calibrated if growth rate, copy number, or crowding differ.
Length penalties apply exclusively to ring-closure and disassembly; recognition parameters are length-invariant:
$$ k_{f2}(L) = k_{f2}^{\circ} \cdot \exp[-\alpha (L - L_0)] $$ $$ k_{rel}(L) = k_{rel}^{\circ} \cdot \exp[+\beta (L - L_0)] $$
where \( L_0 = 1039 \, \text{bp} \). Fitting the 1039/1605/1613-bp triad yields (α, β) with 95% CIs (see Table S3). Longer segments reduce end-to-end encounters (\( k_{f2}\downarrow \)) and increase strain-induced disassembly (\( k_{rel}\uparrow \)) [17][18][19][20][21]; shorter segments elevate \( P_{ring} \) and accelerate completion.
| Parameter Class | Data Source | Calibration Order |
|---|---|---|
| kon, koff (sequence-specific) | Heat-map ratios [10] | L1: Fixed |
| kf1, ksynr, kcat | One-to-one kinetics | L2: Fit |
| kr2, krel, α, β | Two-to-two 240-min conditional success | L3: Fit |
| asite, gloop, fssAP | Chromosomal context | L4: Inherit |
| PTX | Transformation efficiency | Fixed: 0.017 |
This hierarchy prevents over-parameterization [7][8]. Minimal observation sets per layer:
A compact set of calibrated parameters is summarized in Table 1. Full parameter tables, initial conditions, and locus-wise geometry estimates are provided in Tables S1–S3 .
Table 1. Core Kinetic Parameters
| Symbol | Definition | Value | Unit | Source |
|---|---|---|---|---|
| Recognition & Binding | ||||
| kon | attP–Bxb1 → CP | 4.3 × 105 | M−1s−1 | Literature |
| koff | CP dissociation | 6.5 | s−1 | Literature |
| Kd | Dissociation constant | 15 | nM | koff/kon |
| Synapsis & Cleavage | ||||
| kf1 | Single-end synapsis | 1.7 × 10−3 | s−1 | One-to-one fit |
| ksynr | Synaptic disassembly | 3.3 × 10−5 | s−1 | One-to-one fit |
| kcat | Strand exchange | 8.3 × 10−4 | s−1 | Literature |
| Two-to-Two Specific | ||||
| kf2 | Ring-closure (L = L0) | 1.7 × 10−5 | s−1 | 240-min fit |
| krel | Single-end failure (L = L0) | 5.9 × 10−4 | s−1 | 240-min fit |
| α | Length penalty (kf2) | 1.5 × 10−3 | bp−1 | Length series (see Table S3) |
| β | Length penalty (krel) | 1.0 × 10−3 | bp−1 | Length series (see Table S3) |
| L0 | Reference length | 1039 | bp | Shortest construct |
| Effective Parameters | ||||
| fssAP | ssAP activation | 0.05–0.10 | — | M3 output |
| asite | Locus accessibility | 0.28–0.40 | — | Position-dependent (see Table S2) |
| gloop | Looping geometry | 0.30–0.40 | — | DNA topology |
| Ieff | Effective Bxb1 | 1–2 | μM | Itot × fssAP |
| PTX | Transformation efficiency | 0.017 | — | 1700/100,000 |
Pre-synaptic complex formation reaches quasi-equilibrium rapidly (\( \tau \approx 5 \, \text{min} \)); product binding becomes relevant only in late phases (>60 min) when attL/attR accumulation exceeds ~30% conversion. This baseline establishes that if two-to-two underperforms, the bottleneck lies in \( k_{f2} \) versus \( k_{rel} \) competition rather than deficient recognition.
Bxb1 dynamics: The total translated Bxb1 (Itot) rises during 0–240 min and approaches the order of 10 μM in our simulations (model outputs informed by M2/M3). The effective reactive fraction (Ieff = Itot × fssAP × ffree) remains around 1–2 μM and serves as the input concentration for downstream kinetic models. This scale is consistent with the 0.3–1.0 μM operational window defined in M5 and used for Mode B optimization.
Dual-end competition: The success pathway flux (Jf2 = kf2 [Sleft][Sright]) directly competes with failure (krel [Si]), determining whether dual-end ring-closure completes within the synchrony window before single-end intermediates commit to failure pathways.
Population mapping: Calibrated to 1.147% conditional success at 240 min within \( n_{attB} = 1700 \) cells; multiplication by \( P_{TX} = 0.017 \) yields 0.0195% whole-population rate.
Under constant Bxb1 (~4.36 μM), success is governed by \( k_{f2}/k_{rel} \) balance and fragment length. The observed ordering (1039 bp: 87.5%, 1605 bp: 80%, 1613 bp: 79%) emerges purely from length-dependent \( k_{f2}/k_{rel} \) modulation.
Bars represent mean success percentage across \( n = 3 \) biological replicates (error bars: ±1 SD). Fragment lengths: 1039 bp (reference), 1605 bp, 1613 bp. Solid line: model prediction using \( k_{f2}(L) = k_{f2}^{\circ} \exp[-\alpha (L - L_0)] \) with \( \alpha = 1.5 \times 10^{-3} \, \text{bp}^{-1} \) fitted to data. Dashed lines: 95% prediction interval. Bootstrap resamples were drawn at the replicate level with a fixed RNG seed for reproducibility; point estimates are medians, and bands denote 95% CIs.
Without RDF, the forward pathway after dual-end closure is effectively irreversible under our conditions; we did not observe reverse flux, and products enter only via passive enzyme sequestration (no catalytic inhibition). This explains why increasing \( k_{f2} \) (or effective [S]) dramatically shifts overall success. Conversely, “waiting longer” provides diminishing returns once the synchrony window closes: single-end-committed failures become kinetically locked.
DNA looping probability scales approximately as \( P_{loop} \propto L^{-3/2} \) for \( L > \) persistence length (~50 nm ≈ 150 bp) [19][20][21]. For our 1.0–1.6 kb range, this predicts substantial reduction in encounter frequency, consistent with observed efficiency drops. Shorter segments elevate \( P_{ring} \) faster than they increase torsional strain because bending energy scales as \( \kappa \cdot \theta^2 / L \), while encounter frequency scales as \( L^{-3/2} \).
Ori-proximal loci exhibit higher supercoiling density, greater replication-associated accessibility, and reduced NAP crowding [3][4]. These effects compound into \( a_{site} \) gradients ranging from ~0.40 (hisA, 0.2 Mb from oriC) to ~0.28 (lacZ, 1.5 Mb from oriC), directly modulating \( k_{on,eff} \) (see Table S2).
Operationally, we require \( K_{d,cross}/K_{d,self} > 100 \) and observe <1% cross-recombination in plasmid challenge assays; see Methods. This validates clean heat-map → \( k_{on} \) translation and enables stable multi-round performance by preventing inter-round cross-talk. We require (i) <1% cross-recombination in plasmid challenge assays and (ii) no off-target junctions by junction-specific PCR across two rounds.
Dual-end recombination assembles two Bxb1 dimers (one per att pair) into a tetrameric synaptic complex. When sites are closely spaced (≈1–1.3 kb) and co-oriented, DNA bending strain remains small, elevating effective \( k_{f2} \). Beyond a critical spacing, DNA must loop back under torsional constraint, increasing \( k_{rel} \) probability.
This geometric framework explains the “shorter is better” principle: confinement that might hinder one-to-one insertions (steric crowding) actually facilitates two-to-two replacements by constraining the search volume for dual-end encounters. In molecular visualization, the tetramer functions as a rigid hinge—once both ends dock, cleavage proceeds with near-irreversible commitment.
To experimentally validate M4 predictions:
While M4 captures essential dual-end recombination kinetics, several biological processes remain simplified:
Quantitative predictions are most trustworthy for 1–2 kb segments in mid-log E. coli (\( \text{OD}_{600} = 0.4\text{–}0.6 \)) where repair activity and topological state are stable.
[1] Bonnet, J., Subsoontorn, P., & Endy, D. (2012). Rewritable digital data storage in live cells via engineered control of recombination directionality. Proceedings of the National Academy of Sciences, 109(23), 8884–8889. https://doi.org/10.1073/pnas.1202344109
[2] Inniss, M. C., & Silver, P. A. (2013). Building synthetic memory. Current Biology, 23(17), R812–R816. https://doi.org/10.1016/j.cub.2013.07.056
[3] Bryant, J. A., Sellars, L. E., Busby, S. J., & Lee, D. J. (2014). Chromosome position effects on gene expression in Escherichia coli K-12. Nucleic Acids Research, 42(18), 11383–11392. https://doi.org/10.1093/nar/gku828
[4] Valens, M., Penaud, S., Rossignol, M., Cornet, F., & Boccard, F. (2004). Macrodomain organization of the Escherichia coli chromosome. EMBO Journal, 23(21), 4330–4341. https://doi.org/10.1038/sj.emboj.7600434
[5] Rutherford, K., & Van Duyne, G. D. (2014). The ins and outs of serine integrase site-specific recombination. Current Opinion in Structural Biology, 24, 125–131. https://doi.org/10.1016/j.sbi.2013.12.003
[6] Olorunniji, F. J., Rosser, S. J., & Stark, W. M. (2016). Site-specific recombinases: molecular machines for the genetic revolution. Biochemical Journal, 473(6), 673–684. https://doi.org/10.1042/BJ20151112
[7] Raue, A., Kreutz, C., Maiwald, T., Bachmann, J., Schilling, M., Klingmüller, U., & Timmer, J. (2009). Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics, 25(15), 1923–1929. https://doi.org/10.1093/bioinformatics/btp358
[8] Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R., & Sethna, J. P. (2007). Universally sloppy parameter sensitivities in systems biology models. PLoS Computational Biology, 3(10), e189. https://doi.org/10.1371/journal.pcbi.0030189
[9] Smith, M. C. M., & Thorpe, H. M. (2002). Diversity in the serine recombinases. Molecular Microbiology, 44(2), 299–307. https://doi.org/10.1046/j.1365-2958.2002.02891.x
[10] Saunders, S. H., & Ahmed, A. M. (2024). Mapping orthogonality and recombination efficiency across serine integrase site variants. Nucleic Acids Research, 52(8), 4234–4249. https://doi.org/10.1093/nar/gkae227
[11] Ghosh, P., Wasil, L. R., & Hatfull, G. F. (2006). Control of phage Bxb1 excision by a novel recombination directionality factor. PLoS Biology, 4(6), e186. https://doi.org/10.1371/journal.pbio.0040186
[12] Savinov, A., Pan, J., & Ghosh, P. (2012). Mechanisms of the phage Bxb1 integrase directionality factor gp47. Virology, 429(2), 129–135. https://doi.org/10.1016/j.virol.2012.04.011
[13] Ghosh, P., Kim, A. I., & Hatfull, G. F. (2003). The orientation of mycobacteriophage Bxb1 integration is dependent on the central dinucleotide of its attachment sites. Molecular Cell, 12(5), 1101–1111. https://doi.org/10.1016/S1097-2765(03)00429-3
[14] Block, D. H., Hussein, R., Liang, L. W., & Lim, H. N. (2012). Regulatory consequences of gene translocation in bacteria. Nucleic Acids Research, 40(18), 8979–8992. https://doi.org/10.1093/nar/gks694
[15] Haldimann, A., & Wanner, B. L. (2001). Conditional-replication, integration, excision, and retrieval plasmid-host systems for gene structure-function studies of bacteria. Journal of Bacteriology, 183(21), 6384–6393. https://doi.org/10.1128/JB.183.21.6384-6393.2001
[16] Datsenko, K. A., & Wanner, B. L. (2000). One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proceedings of the National Academy of Sciences, 97(12), 6640–6645. https://doi.org/10.1073/pnas.120163297
[17] Rippe, K., Guthold, M., von Hippel, P. H., & Bustamante, C. (1995). Transcriptional activation via DNA looping: visualization of intermediate states in E. coli RNA polymerase–λPR promoter complexes by scanning force microscopy. Trends in Biochemical Sciences, 20(12), 500–506. https://doi.org/10.1016/S0968-0004(00)89099-4
[18] Schleif, R. (1992). DNA looping. Annual Review of Biochemistry, 61, 199–223. https://doi.org/10.1146/annurev.bi.61.070192.001215
[19] Yan, J., & Marko, J. F. (2004). Localized single-strand breaks strongly affect global chromatin folding in the Escherichia coli chromosome. Physical Review Letters, 93(10), 108108. https://doi.org/10.1103/PhysRevLett.93.108108
[20] Cloutier, T. E., & Widom, J. (2004). Spontaneous sharp bending of double-stranded DNA. Molecular Cell, 14(3), 355–362. https://doi.org/10.1016/S1097-2765(04)00249-7
[21] Shimada, J., & Yamakawa, H. (1984). Ring-closure probabilities for twisted wormlike chains. Macromolecules, 17(4), 689–698. https://doi.org/10.1021/ma00134a017
[22] Raj, A., & van Oudenaarden, A. (2008). Nature, nurture, or chance: stochastic gene expression and its consequences. Cell, 135(2), 216–226. https://doi.org/10.1016/j.cell.2008.09.050
[23] Elowitz, M. B., Levine, A. J., Siggia, E. D., & Swain, P. S. (2002). Stochastic gene expression in a single cell. Science, 297(5584), 1183–1186. https://doi.org/10.1126/science.1070919
| Symbol | Definition | Value | Unit | Source |
|---|---|---|---|---|
| Recognition & Binding | ||||
| kon | attP–Bxb1 → CP | 4.3 × 105 | M−1s−1 | Literature |
| koff | CP dissociation | 6.5 | s−1 | Literature |
| Kd | Dissociation constant | 15 | nM | koff/kon |
| Synapsis & Cleavage | ||||
| kf1 | Single-end synapsis | 1.7 × 10−3 | s−1 | One-to-one fit |
| ksynr | Synaptic disassembly | 3.3 × 10−5 | s−1 | One-to-one fit |
| kcat | Strand exchange | 8.3 × 10−4 | s−1 | Literature |
| Two-to-Two Specific | ||||
| kf2 | Ring-closure (L = L0) | 1.7 × 10−5 | s−1 | 240-min fit |
| krel | Single-end failure (L = L0) | 5.9 × 10−4 | s−1 | 240-min fit |
| α | Length penalty (kf2) | 1.5 × 10−3 | bp−1 | Length series |
| β | Length penalty (krel) | 1.0 × 10−3 | bp−1 | Length series |
| L0 | Reference length | 1039 | bp | Shortest construct |
| Effective Parameters | ||||
| fssAP | ssAP activation | 0.05–0.10 | — | M3 output |
| asite | Locus accessibility | 0.28–0.40 | — | Position dependent |
| gloop | Looping geometry | 0.30–0.40 | — | DNA topology |
| Ieff | Effective Bxb1 | 1–2 | μM | Itot × fssAP |
| PTX | Transformation efficiency | 0.017 | — | 1700/100,000 |
Length-dependence update: The early fit yielded α = 1.5 × 10⁻³ and β = 1.0 × 10⁻³ bp⁻¹. Subsequent tri-point regression across 1039/1605/1613 bp constructs refined these to αmech = βmech = 8.5 × 10⁻⁴ bp⁻¹ (SLOPE = 1.70 × 10⁻³ bp⁻¹), ensuring consistency with the logit-linear fit reported in M5 (Fig. 1).
| Locus | Distance from oriC (Mb) | asite (mean ± CI) | Relative Efficiency (normalized to hisA) |
Reference |
|---|---|---|---|---|
| hisA | 0.20 | 0.40 ± 0.02 | 1.00 | [3],[14] |
| metA | 0.35 | 0.37 ± 0.02 | 0.77 | [3],[14] |
| galK | 1.45 | 0.33 ± 0.02 | 0.32 | [3],[14] |
| leuD | 1.55 | 0.29 ± 0.02 | 0.08 | [3],[14] |
| Length (bp) | Observed Success (%) | Fitted Success (%) | Residual | Predicted kf2(L) (s−1) | Predicted krel(L) (s−1) |
|---|---|---|---|---|---|
| 1039 | 87.5 | 87.6 | +0.1 | 1.70×10−5 | 5.90×10−4 |
| 1605 | 80.0 | 79.3 | −0.7 | 9.8×10−6 | 6.5×10−4 |
| 1613 | 79.0 | 79.1 | +0.1 | 9.7×10−6 | 6.6×10−4 |
Fit statistics:
α̂ = 1.5 × 10⁻³ bp⁻¹ (95% CI: 1.2–1.9 × 10⁻³)
β̂ = 1.0 × 10⁻³ bp⁻¹ (95% CI: 0.8–1.3 × 10⁻³)
RMSE = 0.53%; R² = 0.991; Cross-validation residual < ±1.2%.
After establishing mechanistic insight across M1–M4, we developed M5 as a compact, interpretable, and experimentally actionable predictive model.
While early layers involved >40 kinetic and structural parameters, sensitivity analysis showed that recombination outcomes are consistently dominated by only a small subset of biological determinants.
This factor-based reduction converts complex ODE/stochastic simulations into a form that experimental users can apply directly.
M5 integrates three biological layers into a single predictive output:
Determines whether Bxb1 remains above its functional activation threshold long enough to support recombination (captured by φtot).
Encodes genomic-position effects, GC-dependent oligo stability, dual-end synapsis geometry, spacer dependence, and length-penalty behavior.
Maps per-cell recombination probability into expected colony counts, accounting for uptake heterogeneity.
Across these scales, M1–M4 consistently highlighted five dominant factors, which M5 isolates into a single multiplicative framework.
For any given experimental condition, M5 predicts final integration success as:
$$P_{\mathrm{final}}=\alpha, f_{\mathrm{pos}}(d_{\mathrm{ori}}), f_{\mathrm{dist}}(L,s), f_{\mathrm{orient}}(\theta), f_{\mathrm{GC}}(g), f_{\mathrm{prot}}(\phi_{\mathrm{tot}})$$
Where:
Reflects the oriC–ter replication gradient discovered in M3.
ori-proximal loci exhibit sharply higher attB formation.
Encodes spacer-length optima and length-dependent penalties revealed in M4.
Penalizes unfavorable attB/attP polarity combinations that reduce synapsis probability.
Models reduced oligo survival and pairing at extreme GC%.
Derived from φtot, reflecting induction time and Bxb1 expression sufficiency.
Interpretation:
M5 is not an empirical curve fit—it is a mechanistic factorization of the causal dependencies discovered in M1–M4.
Dual-end recombination requires both attB–attP pairs to synapse successfully.
Mechanistic analysis (M4) shows that:
Thus, Mode B predictions use:
$$P_{\mathrm{2to2}} \approx \min(P_1, P_2)$$
This approximation preserves biological realism while remaining computationally simple.
Dominated by:
Typical success rate: 0.001–0.05%.
Dominated by:
Typical success rate: 50–90%.
M5 automatically switches between these regimes within the Navigator interface.
(Integrated from Text 1 — essential for credibility)
To test whether the factorized M5 model captures real genomic behavior, we compared predictions to experimental measurements across five loci:
Together, these results validate that the five-factor structure of M5 captures the dominant determinants of recombination efficiency across the chromosome.
(Integrated from Text 1 — gives actionable value)
Situation | Recommended Adjustment |
|---|---|
Locus near ter | Extend ssAP window or select a different ori-proximal locus |
Two-to-two distance too long | Shorten cassette or reduce attB–attP spacing |
Orientation suboptimal | Reverse one att site; redesign primers accordingly |
GC too low/high | Adjust oligo to 40–60% GC |
Poor ssAP–fork overlap | Shift induction to 20–30 min post-electroporation |
These rules form the decision backbone of GenOMe Navigator, enabling users to diagnose limiting factors and optimize constructs before performing experiments.
M5 supports an intuitive three-step workflow:
This connects the mechanistic model to real experimental planning.
Because all mechanistic terms are factorized, porting M5 to new strains or integrases requires only:
Geometry and GC factors remain universal.
M5 converts the mechanistic insights from M1–M4 into a compact, interpretable, and experimentally actionable model. It preserves biological causality while enabling real-time prediction and design optimization through GenOMe Navigator.
Below, we summarize the key quantitative trends emerging from M1–M4 that justified the final five-factor structure of M5. These heatmaps and sweeps are not supplementary decoration—they represent the mechanistic evidence that allowed us to simplify dozens of kinetic parameters into a compact, interpretable predictive model.
Each trend corresponds to one biological layer and directly supports one or more terms in M5.
This analysis revealed a broad plateau where φtot becomes insensitive to both induction time and enzyme concentration. Specifically:
This explains why:
Supports: Protein sufficiency factor (fprot).
Single-cell stochastic simulations showed two strong effects:
These two effects were highly consistent across parameter perturbations, providing the mechanistic basis for incorporating genomic position and GC stability as explicit factors in M5.
Supports: fpos (locus effect), fGC (GC penalty).
M4 simulations revealed a geometric optimum for dual-end capture:
This trend provides the structural justification for:
Supports: fdist (length + spacer geometry).
A striking observation across M4 is that:
This validated the use of a logit-linear length penalty inside M5 and supported the engineering claim that insert length is the primary determinant for Mode B efficiency.
Supports: fdist (dominant length effect).
M3–M4 propagation into M10k-scale population simulations revealed:
This justified the inclusion of the population-level scaling factor α, which differentiates the two integration modes.
Supports: α (population correction for Mode A/B).
While not directly tunable during experiments, sensitivity analysis showed:
This validated that the geometry/length effects in M5 do not require ad-hoc curve fitting, but arise from fundamental kinetic asymmetry.
Supports: Mechanistic plausibility of fdist and orientation factor.
This mapping connects:
This trend allowed M5 to integrate with the software’s AI suggestion system (e.g., advising additional plates if uptake is poor).
Supports: Population interpretation of Pfinal.
(Cite as: “All equations and parameter values correspond to the M5 model internal specification.”)
This section documents all mathematical formulations, parameter values, and computational rules used to generate the M5 heatmaps (Figures 1–8).
These details are not required in the main Model page but are provided here for transparency, reproducibility, and cross-strain recalibration.
Used in Fig. 1, Fig. 2, Fig. 3, Fig. 7.
$$\operatorname{logit}(P_{\text{base}})= a - bL$$ $$P_{\text{base}}=\frac{1}{1+\exp[-(a-bL)]}$$
Parameters (calibrated from experimental anchors):
Symbol | Value | Notes |
|---|---|---|
$a$ | 3.0161 | Defined by 1039 bp → 87.5% |
$b$ | $$ 1.03\times10^{-3},{\rm bp^{-1}} $$ | Joint fit from 3 anchors |
Used in Fig. 1–2.
$$f_{\text{conc}}(C)= \frac{C^n}{K^n + C^n} \bigg/ \frac{C_{\text{ref}}^n}{K^n + C_{\text{ref}}^n}$$
Parameters:
Symbol | Value | Notes |
|---|---|---|
$K$ | 0.20 μM (200 nM) | Local half-saturation |
$n$ | 6 | Steep plateau slope |
$C_{\text{ref}}$ | 0.387 μM (387 nM) | Standard operating concentration |
$$P_{\text{prod}} = \min(P_{\text{base}} \cdot f_{\text{conc}} , 0.95)$$
A soft cap at 0.95 prevents overconfidence in high-L extrapolation.
$$E[\text{positive colonies}] = N_{\rm CFU} \times \min(\alpha_{\rm pop} P_{\text{prod}},1)$$
Parameters:
Symbol | Value | Description |
|---|---|---|
$N_{\rm CFU}$ | 10–40 | Colonies plated |
$\alpha_{\rm pop}$ | 1.0 (default) | Transformation amplification factor |
Used in Fig. 4 & Fig. 5.
$$F_{\text{distance}}(d) = \exp(-k_d d)$$
Symbol | Value | Notes |
|---|---|---|
$k_d$ | $7\times10^{-4},{\rm kb^{-1}}$ | Distance-decay constant |
$d$ | 0–3000 kb | Shortest distance to oriC |
$$f_{\text{GC}}(gc)=\exp[-\gamma(gc - 0.5)^2$$
Symbol | Value |
|---|---|
$\gamma$ | 10 |
gc | 0.30–0.70 |
$$P_{\text{DNA}} \propto f_{\text{GC}}(gc)\cdot F_{\text{distance}}(d)$$
This determines the color scale in Fig. 4 (not an absolute probability).
$$\phi_{\text{tot}} = \phi_{\text{Bxb1}}(C)\cdot \phi_{\text{time}}(t)\cdot \phi_{\text{ssAP}}(t_{\text{ssAP}})$$
$$\phi_{\text{Bxb1}}(C)=\frac{C^{n_H}}{C_{50}^{n_H}+ C^{n_H}}$$
Symbol | Value |
|---|---|
$C_{50}$ | 1 μM |
$n_H$ | 1 |
$$\phi_{\text{time}}=\frac{t}{T_{50}+t}$$
Symbol | Value |
|---|---|
$T_{50}$ | 30 min |
$$\phi_{\text{ssAP}}=\frac{t_{\text{ssAP}}}{T_{\text{ssAP50}}+t_{\text{ssAP}}}$$
Symbol | Value |
|---|---|
$T_{\text{ssAP50}}$ | 20 min |
$t_{\text{ssAP}}$ | 45 min (fixed) |
Each cell receives DNA according to a log-normal distribution:
$$\sigma = \sqrt{\ln(\text{CV}^2 + 1)}, \qquad \mu = \ln(\text{mean}) - \frac{\sigma^2}{2}$$
Single-cell success:
$$P_{\text{cell}} = \min(P_{\text{calibrated}}, G^{\alpha\beta}, \text{uptake},; 0.8)$$
Monte Carlo: 10,000 simulated cells per grid point.
Parameters:
Symbol | Value | Notes |
|---|---|---|
mean | 20k–100k | Electroporation uptake |
CV | 0.1–1.0 | Transformation heterogeneity |
$G$ | 6.632173469 | Global scaling factor |
$$ f_{\text{spacer}}(S)=\exp\left[-\frac{(S-80)^2}{2(40)^2}\right] $$
Symbol | Value |
|---|---|
Optimal S₀ | 80 bp |
σ | 40 bp |
Final probability:
$$P_{\text{prod}} = P_{\text{base}}(L), f_{\text{conc}}(C), f_{\text{spacer}}(S)$$
Length-dependent rate modification:
$$k_{f2} \to k_{f2} e^{-\alpha_{\text{mech}}L}$$
$$k_{rel} \to k_{rel} e^{-\beta_{\text{mech}}L}$$
Parameters:
Symbol | Value |
|---|---|
αmech | $8.5\times10^{-4},{\rm bp^{-1}}$ |
βmech | $8.5\times10^{-4},{\rm bp^{-1}}$ |
Global G | 6.632173469 |
Underlying slope matches empirical:
$$\text{SLOPE} = 1.70\times10^{-3},{\rm bp^{-1}}$$
(You may include this as a downloadable CSV)
Layer | Parameter | Value | Purpose |
|---|---|---|---|
Mode B (Length) | a | 3.0161 | Intercept |
b | $(1.03\times 10^{-3})/bp$ | Length slope | |
Mode B (Conc.) | K | 0.20 μM | Half-saturation |
n | 6 | Hill coefficient | |
Cref | 0.387 μM | SOP condition | |
Protein Layer | C50 | 1 μM | Global activation |
nH | 1 | Hill | |
T50 | 30 min | Time half-max | |
TssAP50 | 20 min | ssAP kinetics | |
DNA Layer | kd | $7\times10^{-4}/\text{kb}$ | oriC-distance decay |
γ | 10 | GC penalty | |
Mechanistic | G | 6.63 | global scaling |
αmech | $8.5\times10^{-4}/bp$ | Length penalty | |
Population | αpop | 1.0 | plating amplification |
Spacer | S0 | 80 bp | optimal spacer |
σ | 40 bp | width |
Through the M1–M5 modeling framework, we established an interpretable, quantitative system that deconstructs Bxb1-mediated genome integration into three interconnected layers — protein, DNA, and population.
Both simulations and experimental data consistently revealed that integration success is not determined by a single kinetic parameter, but by the combined influence of replication timing, DNA geometry, and protein expression dynamics.
Integration efficiency exhibited a clear hierarchy across genomic loci: hisA and metA achieved the highest success rates, galK showed intermediate performance, while leuD and lacZ were markedly lower.
This gradient mirrors the loci’s relative distance from oriC, indicating that replication fork timing is the primary determinant of integration efficiency.
Regions near oriC are duplicated early and remain more accessible to recombineering enzymes, whereas loci near ter replicate later, limiting accessibility.
Notably, lacZ is physically positioned near hisA and galK, not in the terminal region. Its low success rate—comparable to leuD—stems from its two-to-two configuration, which requires recombination across two pairs of attB/attP sites, unlike the single pair used in one-to-one designs.
This contrast reveals that integration efficiency is shaped not only by replication timing but also by geometric coordination within the DNA architecture.
In a one-to-one configuration, recombination proceeds independently at a single attB/attP pair.
By contrast, the two-to-two configuration demands synchronized recombination at both ends, introducing challenges in molecular alignment and spatial coordination.
Our model predicts that if one end completes recombination prematurely, Bxb1 molecules may become transiently trapped at that site, leading to temporal asymmetry and stalling of the opposite reaction.
This explains the lower initial efficiency observed for lacZ, which reflects geometric instability rather than locus position.
While the two-to-two configuration introduces additional geometric constraints, it uniquely enables full cassette integration within a single step, a feat impossible for one-to-one architectures.
Our simulations revealed that the inefficiency of dual-end recombination is not intrinsic to the mechanism but results from unoptimized parameters such as inter-site spacing, enzyme concentration, and induction timing.
Once these were optimized in the M5 framework, the predicted success rate of two-to-two events converged with that of one-to-one integrations.
This finding redefines two-to-two recombination as a design-driven advantage rather than a structural limitation.
By quantifying geometric coordination, the model transforms a former bottleneck into a high-performance, scalable solution for production-level genome engineering — proving that dual-end integration can be both complete and efficient when designed systematically.
The first two-to-two process (installation of dual attB sites) and the second two-to-two process (dual-end cassette integration) represent distinct mechanistic stages.
The first is probability-limited, governed by ssAP expression level and replication-fork timing, resulting in a very low success rate (~0.00195%).
The second occurs only in cells already containing dual attB sites; it is kinetics-limited, dominated by Bxb1 binding and catalytic efficiency.
Under optimized conditions (0.3–1.0 μM Bxb1 and short inter-site spacing), integration proceeds rapidly, reaching success rates exceeding 80%.
This dramatic contrast marks a shift from probabilistic to kinetic limitation, clarifying why integration efficiency can increase by four orders of magnitude between the two stages.
Insert length strongly affects dual-end recombination success.
Both simulations and experiments show that fragments of 1–2 kb yield consistently high efficiencies, while inserts longer than ~3 kb result in a sharp decline.
This pattern reflects a trade-off between reaction kinetics and molecular proximity: longer fragments reduce the likelihood that both ends align simultaneously, thereby lowering the probability of complete integration.
Thus, insert length emerges as a highly tunable design parameter in optimizing two-to-two architectures.
The M1–M2 models predict optimal induction durations of ≈45 minutes for ssAP and ≈240 minutes for Bxb1. When residual transcription during the cooling phase was incorporated, the simulated protein accumulation curves matched experimental observations. These results show that fine-tuning expression timing determines the achievable “integration ceiling”: operating within the predicted windows significantly increases both attB formation and total integration success.
The M3 model simulates attB formation as a stochastic single-cell event and successfully reproduces the experimentally observed “polarized” outcomes, where most cells fail to integrate while a minority succeed. This heterogeneity arises from replication-fork gating: integration occurs primarily during the brief window when the replication fork passes through the target locus, rendering DNA temporarily accessible to ssAP-mediated recombination. Replication timing, therefore, is the underlying driver of both population heterogeneity and the oriC–ter efficiency gradient.
Together, these models define three interacting principles:
The M5 model, implemented as our open-source software GenOMe Navigator, integrates these three layers into a predictive design assistant. Users can input locus position, insert length, and induction schedule to estimate integration efficiency and receive quantitative SOP recommendations, transforming experimental planning from trial-and-error into rational, data-driven design.
While the M5 framework already reproduces experimental trends with strong fidelity,several refinements could further elevate its predictive precision:
These extensions represent next-generation upgrades rather than current limitations— laying the foundation for M5 to evolve into a fully predictive genome-design platform.
Integration of modeling and experimental data yields the following key insights:
Integration efficiency follows an oriC-to-ter gradient defined by replication timing.
Two-to-two geometry, once optimized, achieves high-efficiency dual-end recombination comparable to one-to-one events.
The first and second two-to-two rounds are governed by distinct kinetic bottlenecks — the former by attB formation probability, the latter by enzymatic and geometric factors.
A 1–2 kb length window represents the optimal trade-off between completeness and efficiency.
Protein induction timing defines the integration ceiling.
The M5 model (GenOMe Navigator) converts these multilayer rules into a reproducible, quantitative, and design-driven framework for genome engineering.
In essence, integration success emerges from the interplay of timing, geometry, and kinetics.
The locus provides opportunity, the geometry sets boundaries, and protein dynamics determine attainment.
Through calibrated modeling, these elements are unified into a single predictive platform — transforming genome integration from empirical trial-and-error into a rational, engineering discipline.