Engineering

Our engineering approach follows the Design-Build-Test-Learn cycle, organized into three main areas: Wet Lab (WL), Dry Lab (DL), and Human Practice (HP). Each cycle represents an iterative process of continuous improvement and innovation.

Wet Lab Cycle 1:
Mutation / Validation Environment Choice

We explored multiple strategies to balance biological relevance with measurement precision, ultimately transitioning from in vivo to in vitro mutation and validation systems. This cycle documents our journey from cellular complexity to controlled biochemistry.

Design

Build

Test

Learn

Iteration 1 — Design Phase

We initially designed our mutation and validation process to occur entirely in vivo within E. coli, using bacterial growth as a proxy for selection efficiency under the EcORep orthogonal replication system.¹

We hypothesized that mutation and selection could occur simultaneously inside the cell, with differences in protein expression and functionality influencing mutant survival and accumulation.

Iteration 1 — Build Phase

Mutations were generated through orthogonal DNA replication using EcORep's linear O-replicon. To analyze whether our luminescence reporter could quantitatively reflect protein performance under this biological complexity, we constructed an in vivo ODE model [View Code in Google Colab].

This ODE described:

Logistic cell growth $ \dfrac{dE}{dt} = rE\!\left(1-\dfrac{E}{K}\right) $
Expression and dilution of SpyCatcher and SpyTag proteins²
Their binding into a luminescent complex $AB$ with rate $k_{\text{bind}}$
Luminescence activation $ \dfrac{dL}{dt} = k_{\text{act}}(AB) - k_{\text{decay}}\,L $

By simulating how growth, expression, and binding co-evolve, we could test whether our cellular system would allow distinct k_on values to be distinguished experimentally.

Iteration 1 — Test Phase

We ran parameter scans varying r (bacterial growth rate constant), α_C (synthesis rate of SpyCatcher), α_T (synthesis rate of SpyTag), k_bind (binding rate constant), and K (carrying capacity).

The results revealed that even when k_bind differed by an order of magnitude, the luminescence difference remained minor due to dilution and growth effects. This meant our in vivo system could not resolve kinetic differences between mutants.

Figure 1. In vivo ODE simulation of luciferase luminescence dynamics for two kinetic regimes (k = 60.00 vs 20.00 µM⁻¹·h⁻¹).

Iteration 1 — Learn Phase

Although the in vivo ODE confirmed stable mutation propagation, it also showed that the system was unidentifiable for kinetic parameters — luminescence reflected cell physiology more than binding kinetics.

Furthermore, it was impossible to evaluate the rate constant under different pH conditions inside cells.

Thus, the model directly informed our decision to decouple mutation from kinetic validation, motivating a move toward in vitro testing.

Iteration 2 — Design Phase

To bridge biological relevance with measurable precision, we designed a hybrid cycle: mutations still occurred in vivo, but validation was shifted in vitro.

Iteration 2 — Build Phase

During this stage, we continued using the EcORep orthogonal replication system to generate mutations inside E. coli.

However, sequence analysis revealed that the in vivo mutation process was too extensive — the entire O-replicon, including the luciferase reporter³ and signal peptide regions⁴, accumulated random mutations rather than remaining confined to the intended target gene.

We therefore implemented in silico structure prediction (RF Diffusion, AlphaFold2) to investigate whether such global mutations could compromise protein folding and complex formation.

Iteration 2 — Test Phase

Simulation results showed that these uncontrolled mutations frequently disrupted key structural motifs, leading to a collapse of the luciferase and signal peptide architecture.

This indicated that the in vivo mutagenic environment itself was unstable and that excessive replication errors could compromise the integrity of our reporter system.

Iteration 2 — Learn Phase

From this, we learned that performing mutation in vivo risks unwanted propagation across the entire O-replicon, damaging both expression and functional modules.

To ensure precise control, we decided to move mutation generation in vitro using error-prone PCR,⁵⁶⁷⁸⁹ where the mutational rate and target region could be strictly limited. This insight set the foundation for our fully in vitro epPCR-based workflow in the next cycle.

Iteration 3 — Design Phase

After recognizing the limits of cellular context, we transitioned to a fully in vitro workflow. Both mutation generation and validation were moved outside the cell for full control. Mutants were now created via error-prone PCR, ensuring adjustable mutation rates under defined Mn²⁺ and dNTP conditions.

Iteration 3 — Build Phase

To design the validation workflow, we constructed an in vitro kinetics ODE [View Code in Google Colab] describing the reaction:

A (SpyTag) + B (SpyCatcher) → AB

as an irreversible second-order process:

$$ \frac{dA}{dt} = \frac{dB}{dt} = -\,k_{\text{bind}}\,AB, \qquad \frac{d(AB)}{dt} = k_{\text{bind}}\,AB $$

and mapped AB(t) to plate-reader luminescence through a saturating transfer function and activation/decay kinetics:

$$ L(t) = \frac{\eta\,AB}{K_{\text{app}}+\eta\,AB} \left(1 - e^{-t/\tau_{\text{on}}}\right) e^{-t/\tau_{\text{decay}}} $$

By scanning k_bind (≈ 6 vs 72 µM⁻¹ h⁻¹), we could determine the optimal starting concentrations and time points to maximize signal discrimination.

Iteration 3 — Test Phase

Simulations predicted that using 0.1 µM of each reactant and measuring at 5, 15, 30, 60 min would clearly separate mutants by k_on.

This ODE-guided design minimized trial-and-error in wet-lab testing.

Figure 2. In vitro simulation of split-luciferase luminescence under different binding rate constants (k₁ = 60 µM⁻¹·h⁻¹, k₂ = 20 µM⁻¹·h⁻¹).

Iteration 3 — Learn Phase

This cycle validated that a fully in vitro strategy yields quantitative and reproducible kinetic data under tunable pH.

The modeling framework now serves as a pre-experiment planning tool: each new mutant library first undergoes ODE simulation to set experimental parameters.

Current Iteration:

Lab: WL | Cycle: 1 | Iteration: 1 ▼

References

1. Tian, R., Rehm, F. B. H., Czernecki, D., Gu, Y., Zürcher, J. F., Liu, K. C., & Chin, J. W. (2024). Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science, 383(6681), 421–426. https://doi.org/10.1126/science.adk1281
2. Keeble, A. H., Banerjee, A., Ferla, M. P., Reddington, S. C., Anuar, I. N. A. K., & Howarth, M. (2017). Evolving accelerated amidation by SpyTag/SpyCatcher to analyze membrane dynamics. Angewandte Chemie International Edition, 56(52), 16521–16525. https://doi.org/10.1002/anie.201707623
3. Qin, H., Anderson, D., Zou, Z., Merritt, J., & Li, L. (2024). Mass spectrometry and split luciferase complementation assays reveal the MecA protein interactome of Streptococcus mutans. Microbiology Spectrum, 12(1), e03691-23. https://doi.org/10.1128/spectrum.03691-23
4. Gonzalez-Perez, D., Ratcliffe, J., Tan, S. K., et al. (2021). Random and combinatorial mutagenesis for improved total production of secretory target protein in Escherichia coli. Scientific Reports, 11, 5290. https://doi.org/10.1038/s41598-021-84859-6
5. Gao, Y., Zhao, H., Lv, M., Sun, G., Yang, X., & Wang, H. (2014). A simple error-prone PCR method through dATP reduction. Wei Sheng Wu Xue Bao, 54(1), 97–103. https://doi.org/10.3321/j.issn:0564-3245.2014.01.015
6. Rasila, T. S., Pajunen, M. I., & Savilahti, H. (2009). Critical evaluation of random mutagenesis by error-prone polymerase chain reaction protocols. Analytical Biochemistry, 388(1), 71–80. https://doi.org/10.1016/j.ab.2009.02.008
7. Cirino, P. C., Mayer, K. M., & Umeno, D. (2003). Generating mutant libraries using error-prone PCR. In Methods in Molecular Biology (Vol. 231, pp. 3–9). Humana Press. https://doi.org/10.1385/1-59259-395-X:3
8. Cadwell, R. C., & Joyce, G. F. (1992). Randomization of genes by PCR mutagenesis. PCR Methods and Applications, 2(1), 28-33. https://doi.org/10.1101/gr.2.1.28
9. Cirino, P. C., Mayer, K. M., & Umeno, D. (2003). Generating mutant libraries using error-prone PCR. In F. H. Arnold & G. Georgiou (Eds.), Directed Evolution Library Creation (Methods in Molecular Biology, Vol. 231, pp. 3–9). Humana Press. https://doi.org/10.1385/1-59259-395-X:3

Wet Lab Cycle 2:
Quantifying Usable Mutation Space in epPCR

Error-prone PCR (epPCR) is essential for creating diversity, but excessive mutagenesis produces premature stop codons that make variants unscreenable. We engineered a quantitative planning tool that ties per-reaction error rates and cycle counts to the fraction of usable sequences (two modes defined below). This guided our wet-lab cycle numbers to balance diversity vs usability.

Design

Build

Test

Learn

Iteration 1 — Design Phase

Goal: convert a reported per-reaction epPCR error rate p (e.g., 0.6–3.5% nt⁻¹·reaction⁻¹, depending on Mn²⁺ and dNTP imbalance) into a per-cycle substitution rate μ. Ranges for epPCR and the logic of serial dilution / high-Mn²⁺ conditions are drawn from a standard protocol and a commercial kit.¹

Iteration 1 — Build Phase

We model base changes with the Jukes–Cantor (JC69) 4-state substitution model (equal base frequencies/rates). Under JC69, the probability a nucleotide remains the same after c cycles at per-cycle rate μ is:

$$ P_{\text{same}}(c) \;=\; \tfrac14 \;+\; \tfrac34 \Bigl(1-\tfrac{4\mu}{3}\Bigr)^{c} \;\; \tag{Eq.\ 1} $$ ²

and P_diff(c) = 1 - P_same(c). Matching a measured per-reaction error p to P_diff(c) and solving for μ yields:

$$ \mu \;=\; \tfrac34 \!\left( 1 - \bigl(1-\tfrac{4p}{3}\bigr)^{\!1/c} \right) \;\; \tag{Eq.\ 2} $$ ²

We use Eq. 2 to calibrate μ for any chosen p and cycle count c.³

Code A: JC69 essentials (calibration + transition probabilities)

# --- Code A: JC69 essentials (calibration + transition probabilities) ---
import numpy as np

def per_reaction_to_per_cycle_error(p_fraction: float, cycles: int) -> float:
    """
    Convert a per-reaction error fraction p (e.g., 0.02 for 2%) into
    the per-cycle substitution rate μ using JC69. 
    """
    if cycles <= 0:
        return 0.0
    return 0.75 * (1.0 - (1.0 - (4.0 * p_fraction) / 3.0) ** (1.0 / cycles))

def jc69_psame_pdiff(mu: float, cycles: int):
    """
    JC69 transition probabilities after 'cycles' steps: 
      P_same(c) = 1/4 + 3/4 * (1 - 4μ/3)^c
      P_diff(c) = 1 - P_same(c)
    """
    p_same = 0.25 + 0.75 * (1.0 - 4.0 * mu / 3.0) ** cycles
    return p_same, 1.0 - p_same

def jc69_transition_matrix(p_same: float):
    """
    4×4 single-base transition matrix with equal off-diagonals (A,C,G,T). 
    """
    P = np.full((4, 4), (1.0 - p_same) / 3.0)
    np.fill_diagonal(P, p_same)
    bases = ['A', 'C', 'G', 'T']
    return P, bases

Iteration 1 — Test Phase

We validated Eq. 2 numerically: plugging in p = {0.5, 1, 2, 3.5}% and c = 10, 20, 40 gave self-consistent P_diff(c) when re-composed via Eq. 1.

Iteration 1 — Learn Phase

We confirmed the need to separate per-reaction p from per-cycle μ; without this calibration, downstream stop-codon risk is badly mis-estimated at high cycle counts. We proceed to codon-level predictions.

Iteration 2 — Design Phase

Goal: compute the probability an ORF accumulates new stop codons after epPCR. Literature explains that mutagenic conditions increase substitution frequency broadly (not only transitions), consistent with a symmetric model at planning time.¹

We define two usability metrics used throughout:

Usability-A (no-new-stops): fraction of molecules that introduce no additional stop codon relative to the original ORF (the strict screening criterion).
Usability-B (final-no-stop): fraction of molecules that end stop-free (informative but less strict).

Iteration 2 — Build Phase

For each codon b₁b₂b₃, we compute the probability it becomes any of {TAA, TAG, TGA} after c cycles using the single-base transition probabilities from JC69 (independence across positions):

$$ \Pr(\text{codon}\to \text{STOP}) \;=\; \sum_{\text{STOP}\in\{ \mathrm{TAA},\mathrm{TAG},\mathrm{TGA} \}} \prod_{i=1}^{3} P_{\,b_i \to \text{STOP}_i}(c) \;\; \tag{Eq.\ 3} $$ ²

We then multiply the no-stop probabilities across all relevant codons (log-sum for stability) to get Usability-A, and similarly compute Usability-B.³

Figure 1. Usability-A (%) surface versus per-reaction error and cycles; red points = 10%-usability ridge (design threshold).

Figure 2. Expected mutated sites; surface colored by concentration score 1/(1+CV); red points = 10% ridge.

Code B: codon-level stop risk + usability definitions (uses Code A)

# --- Code B: codon-level stop risk + usability definitions (uses Code A) ---
import numpy as np

STOP_CODONS = {"TAA", "TAG", "TGA"}

def clean_sequence(seq: str) -> str:
    return ''.join(b for b in seq.upper() if b in "ACGT")

def get_codons(seq: str, frame: int = 0):
    if frame:
        seq = seq[frame:]
    k = len(seq) % 3
    if k:
        seq += "N" * (3 - k)
    return [c for c in (seq[i:i+3] for i in range(0, len(seq), 3)) if "N" not in c]

def codon_to_stop_probability(codon: str, P, bases) -> float:
    """
    P(codon → any STOP) after c cycles under JC69:
        sum_{stop ∈ {TAA,TAG,TGA}} ∏_{i=1..3} P[b_i → stop_i]. 
    """
    idx = {b: i for i, b in enumerate(bases)}
    total = 0.0
    for stop in STOP_CODONS:
        p = 1.0
        for a, b in zip(codon, stop):
            p *= P[idx[a], idx[b]]
        total += p
    return total

def usability_fraction(sequence: str, P, bases, frame: int = 0, mode: str = "no_new_stops") -> float:
    """
    Usability-A ('no_new_stops'): fraction with NO added stops (strict).
    Usability-B ('final_no_stop'): fraction that ends stop-free (relaxed).
    """
    codons = get_codons(clean_sequence(sequence), frame)
    if not codons:
        return 1.0

    idx_stop0 = {i for i, c in enumerate(codons) if c in STOP_CODONS}

    if mode == "no_new_stops":
        relevant = (c for i, c in enumerate(codons) if i not in idx_stop0)
    else:  
        relevant = (c for c in codons)

    log_ok = 0.0
    for codon in relevant:
        p_stop = codon_to_stop_probability(codon, P, bases)
        p_ok = max(1e-300, 1.0 - p_stop)  # numerical safety
        log_ok += np.log(p_ok)
    return float(np.exp(log_ok))

Iteration 2 — Test Phase

Using your target ORF and frames 0/1/2:

At 1% error, 20 cycles, Usability-A > 90%; Usability-B slightly higher (as expected).
At 3% error, 50 cycles, Usability-A falls below 10%, indicating most variants would be unscreenable under strict criteria.
Frame choice can subtly change per-codon risk near boundaries, visible in the heatmap of stop-probabilities.

Iteration 2 — Learn Phase

The strict metric (Usability-A) is what matters for protein-level screening; Usability-B is retained as an educational/diagnostic view. With this decision, we move to selecting cycles that keep Usability-A acceptable.

Iteration 3 — Design Phase

Goal: define a simple rule for choosing cycle counts: the ridge where Usability-A = 10% across the (p,c) grid. This aligns with practical guidance that highly mutagenic mixes (Mn²⁺, unbalanced dNTPs) should limit cycles to avoid unusable libraries.¹

Figure 3. Usable molecule count = N₀ × amplification × Usability-A; guides plate-throughput planning.

Figure 4. Concentration score 1/(1+CV) highlighting dispersion across conditions.

Iteration 3 — Build Phase

We implemented a ridge finder that, for each p, selects the cycle c whose Usability-A is closest to 10% and exports a CSV summary.

Code C: grid evaluation + 10% ridge + minimal plotting/export (uses A & B)

# --- Code C: grid evaluation + 10% ridge + minimal plotting/export (uses A & B) ---
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def evaluate_grid(sequence: str,
                  p_min=0.5, p_max=4.0, p_step=0.05,   
                  c_min=1, c_max=100, c_step=1,        
                  frame=0, mode="no_new_stops"):
    rows = []
    p_vals = np.arange(p_min, p_max + 1e-12, p_step) / 100.0
    c_vals = np.arange(c_min, c_max + 1e-12, c_step)

    for p in p_vals:
        for c in c_vals:
            mu = per_reaction_to_per_cycle_error(p, int(c))
            p_same, p_diff = jc69_psame_pdiff(mu, int(c))
            P, bases = jc69_transition_matrix(p_same)

            u = usability_fraction(sequence, P, bases, frame=frame, mode=mode)
            rows.append({
                "per_reaction_error_pct": p * 100.0,
                "cycles": int(c),
                "per_cycle_mu": mu,
                "p_same": p_same,
                "p_diff": p_diff,
                "usability_pct": 100.0 * u,
                "expected_mutations": len(clean_sequence(sequence)) * p_diff
            })
    return pd.DataFrame(rows)

def find_usability_ridge(df: pd.DataFrame, target=10.0):
    """For each p, find the cycle where usability_pct is closest to 'target' (e.g., 10%)."""
    out = []
    for p in sorted(df["per_reaction_error_pct"].unique()):
        sub = df[df["per_reaction_error_pct"] == p].copy()
        k = (sub["usability_pct"] - target).abs().idxmin()
        r = sub.loc[k]
        out.append({"per_reaction_error_pct": p,
                    "cycles_at_target": int(r["cycles"]),
                    "usability_at_target": float(r["usability_pct"]),
                    "expected_mutations_at_target": float(r["expected_mutations"])})
    return pd.DataFrame(out)

def plot_usability_surface(df: pd.DataFrame, ridge: pd.DataFrame):
    p_unique = sorted(df["per_reaction_error_pct"].unique())
    c_unique = sorted(df["cycles"].unique())
    P, C = np.meshgrid(p_unique, c_unique)

    Z = np.zeros_like(P, dtype=float)
    for i, c in enumerate(c_unique):
        for j, p in enumerate(p_unique):
            v = df[(df["per_reaction_error_pct"] == p) & (df["cycles"] == c)]["usability_pct"]
            if not v.empty:
                Z[i, j] = v.iloc[0]

    fig = plt.figure(figsize=(9, 7))
    ax = fig.add_subplot(111, projection='3d')
    ax.plot_surface(P, C, Z, cmap="viridis", alpha=0.8)
    if not ridge.empty:
        ax.scatter(ridge["per_reaction_error_pct"], ridge["cycles_at_target"],
                   ridge["usability_at_target"], color="red", s=30, label="10% ridge")
        ax.legend()
    ax.set_xlabel("Per-reaction error (%)")
    ax.set_ylabel("PCR cycles")
    ax.set_zlabel("Usability-A (%)")
    ax.set_title("Figure 2: Usability-A surface with 10% ridge")
    plt.tight_layout()
    plt.show()

if __name__ == "__main__":
    TARGET_SEQUENCE = "ACGT" * 150  # replace with your ORF
    df = evaluate_grid(TARGET_SEQUENCE, p_min=0.5, p_max=4.0, p_step=0.1,
                       c_min=1, c_max=100, c_step=2, frame=0, mode="no_new_stops")
    ridge = find_usability_ridge(df, target=10.0)
    df.to_csv("pcr_error_grid.csv", index=False)
    ridge.to_csv("usability_ridge_10pct.csv", index=False)
    plot_usability_surface(df, ridge)

Iteration 3 — Test Phase

Representative results from our grid (sequence-specific but robust):

0.5% error → ~80–90 cycles at 10% Usability-A.
2.0% error → ~25 cycles.
3.5% error → <15 cycles.

This matches practical experience: strongly error-prone conditions demand fewer cycles to keep usable ORFs.¹

Iteration 3 — Learn Phase

Actionable rule for Build:

For kit-like recipes near ~2% error, cap epPCR at ≈20–25 cycles.¹
For milder recipes (~0.6–1.0%), cycles can extend ≥60 while maintaining a reasonable usability band.¹

These limits carry directly into our wet-lab rounds and sample-throughput calculations (starting copy number × amplification factor × Usability-A).

Current Iteration:

Lab: WL | Cycle: 2 | Iteration: 1 ▼

References

1. McCullum, E. O., Williams, B. A., Zhang, J., & Chaput, J. C. (2010). Random mutagenesis by error-prone PCR. Methods in molecular biology (Clifton, N.J.), 634, 103–109. https://doi.org/10.1007/978-1-60761-652-8_7
2. Jukes, T.H. and Cantor, C.R. (1969) Evolution of Protein Molecules. In: Munro, H.N., Ed., Mammalian Protein Metabolism, Academic Press, New York, 21-132. http://dx.doi.org/10.1016/B978-1-4832-3211-9.50009-7
3. Cadwell, R. C., & Joyce, G. F. (1992). Randomization of genes by PCR mutagenesis. PCR methods and applications, 2(1), 28–33. https://doi.org/10.1101/gr.2.1.28

Wet Lab Cycle 3:
Optimization of Luciferase–SpyCatcher/SpyTag Fusion

We optimized the fusion design of split luciferase fragments with SpyCatcher/SpyTag domains to ensure efficient reconstitution while preserving both enzymatic activity and covalent coupling capability.

Design

Build

Test

Learn

Iteration 1 — Design Phase

To ensure efficient reconstitution of split luciferase, we designed N'-Luciferase–SpyCatcher and C-'Luciferase–SpyTag, placing both Spy domains at the C terminus of their respective luciferase fragments.

The N′ fragment contains the catalytic β-barrel; attaching SpyCatcher at its C end prevents interference with folding and substrate entry.²³

Moreover, SpyCatcher's reactive Lys31 lies near its own N terminus—this orientation keeps the lysine solvent-exposed and properly aligned toward SpyTag's Asp117 for rapid isopeptide bond formation.⁴

Previous studies on luciferase cyclization confirmed that C-terminal SpyCatcher fusion enhances thermal stability and preserves enzymatic activity.²

Thus, this configuration balances luciferase integrity with Spy system accessibility.

Iteration 1 — Build Phase

Each construct used a (GGGGS)₃ linker to maintain flexibility between domains.

Expression in E. coli and western blot analysis verified full-length fusion proteins, indicating structural compatibility.

Iteration 1 — Test Phase

AlphaFold2 modeling and prior structural analyses support that SpyCatcher at the N'-luciferase C terminus minimizes steric hindrance and preserves the catalytic region's architecture.³⁴

Furthermore, the exposed positioning of SpyCatcher's reactive lysine allows spontaneous covalent ligation upon SpyTag encounter, offering a structurally and kinetically favorable orientation for reconstitution.

N-terminal Split Luciferase fused with SpyCatcher

C-terminal Split Luciferase fused with SpyTag

Iteration 1 — Learn Phase

This cycle established that N'-Luciferase–SpyCatcher and C-'Luciferase–SpyTag fusions provide a stable, functionally rational design for split-luciferase reconstitution, preserving both luciferase folding and SpyCatcher reactivity.

Current Iteration:

Lab: WL | Cycle: 3 | Iteration: 1 ▼

References

1. Qin, H., Anderson, D., Zou, Z., Higashi, D., Borland, C., Kreth, J., & Merritt, J. (2024). Mass spectrometry and split luciferase complementation assays reveal the MecA protein interactome of Streptococcus mutans. Microbiology spectrum, 12(2), e0369123. https://doi.org/10.1128/spectrum.03691-23
2. Si, M., Xu, Q., Jiang, L., & Huang, H. (2016). SpyTag/SpyCatcher Cyclization Enhances the Thermostability of Firefly Luciferase. PloS one, 11(9), e0162318. https://doi.org/10.1371/journal.pone.0162318
3. Ye, Q., Lin, X., Wang, T., Cui, Y., Jiang, H., & Lu, Y. (2022). Programmable protein topology via SpyCatcher-SpyTag chemistry in one-pot cell-free expression system. Protein science : a publication of the Protein Society, 31(6), e4335. https://doi.org/10.1002/pro.4335
4. Keeble, A. H., & Howarth, M. (2019). Insider information on successful covalent protein coupling with help from SpyBank. Methods in enzymology, 617, 443–461. https://doi.org/10.1016/bs.mie.2018.12.010

Wet Lab Cycle 4:
Attempting to Reconstruct the Original EcORep System

We evaluated the feasibility of fully reconstructing the EcORep orthogonal replication system as reported by Tian et al. (2024), recognizing the complexity and metabolic burden before proceeding to a simplified design.

Design

Build

Test

Learn

Design Phase

Our initial goal was to fully reconstruct the EcORep system reported by Tian et al. (2024, Science) in order to validate its feasibility for orthogonal replication and continuous evolution in E. coli.

According to the original paper, the system consisted of:

a chromosomally integrated synthetic replication operon (TP, O-DNAP, SSB, DSB),
a linear O-replicon flanked by PRD1 inverted terminal repeats (ITRs), and
several auxiliary plasmids, including λ-Gam, mutagenic O-DNAP variants, and reporter modules.

We planned to reproduce this design under standard laboratory conditions and re-establish a functional orthogonal replication system through multi-plasmid co-transformation.

Build Phase

Following the original design, we prepared to simultaneously introduce:

a linear O-replicon as the mutational cargo;
a replication plasmid encoding TP, O-DNAP, SSB, and DSB;
an auxiliary λ-Gam plasmid to inhibit the host RecBCD nuclease;
and an additional plasmid carrying mutagenic variants such as O-DNAP(N71D) or O-DNAP(Y127A).

This configuration required maintaining three to four plasmids with different copy numbers and antibiotic markers under IPTG induction.

Test Phase

Before attempting the full reconstruction, we consulted our instructors and molecular biology advisors to evaluate the practical feasibility of this setup.

They pointed out that:

"Although the original EcORep system is theoretically feasible, maintaining multiple plasmids with different replication origins and antibiotic resistances will impose severe metabolic burden and instability on E. coli. In addition, some modules require specific host strains and tightly controlled expression conditions to function properly."

Based on this expert feedback, we realized that rebuilding the complete EcORep architecture under our current lab conditions would not be achievable.

Attempting the design as-is would likely result in low transformation efficiency, high cellular stress, and rapid plasmid loss.

This discussion served as a theoretical validation of our design, allowing us to recognize its practical limitations before proceeding.

Learn Phase

Through this test and expert consultation, we learned that while the multi-layered EcORep system is conceptually powerful, its high complexity and plasmid burden make it difficult to reproduce and sustain in standard E. coli settings.

As a result, we decided to abandon the multi-plasmid reconstruction approach and instead redesign a simplified two-component version:

a linear O-replicon and
a single helper plasmid that integrates all essential replication modules.

This realization became the foundation for Cycles 4 and 5, where we began re-engineering EcORep into a more modular, stable, and accessible form.

Current Iteration:

Lab: WL | Cycle: 4 | Iteration: 1 ▼

References

1. Tian, R., Rehm, F. B. H., Czernecki, D., Gu, Y., Zürcher, J. F., Liu, K. C., & Chin, J. W. (2024). Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science (New York, N.Y.), 383(6681), 421–426. https://doi.org/10.1126/science.adk1281

Wet Lab Cycles 5–6:
Establishing and Characterizing the Minimal Orthogonal Replication System

We designed, built, and tested a simplified orthogonal replication system by reducing the EcORep architecture to three essential genes (TP, O-DNAP, SSB). Both cycles share the same design and build phases but evaluate different functional aspects: Cycle 4 validates replication stability, while Cycle 5 assesses mutation tunability.

Design

Build

Test

Learn

Design Phase (Shared with Cycle 6)

Both cycles were based on the same minimal orthogonal replication system inspired by Tian et al. (2024).

To reduce host burden and simplify implementation, we removed DSB repair and chromosomal integration components, retaining only three essential genes: TP (terminal protein), O-DNAP (orthogonal DNA polymerase), and SSB (single-stranded DNA-binding protein).

These genes were placed under two inducible promoters to form a dual-operon module:

Operon 1: P_tac → TP → O-DNAP → SSB → terminator (IPTG-inducible, high-fidelity replication)
Operon 2: P_rhaBAD → TP → O-DNAP (N71D) → SSB → terminator (rhamnose-inducible, mutagenic replication)

This shared design allowed us to evaluate two key aspects of the same system: replication stability (Cycle 4) and mutation tunability (Cycle 5).

Build Phase (Shared with Cycle 6)

A linear O-replicon was constructed, flanked by 18 bp PRD1 inverted terminal repeats (ITRs) (The paper mentioned that using 18 bp from each side works), and containing a resistance cassette and reporter gene (GFP in pretest) for selection.

The EcoRep plasmid carried both Operon 1 and Operon 2, allowing independent control by IPTG and rhamnose.

Co-transformation into E. coli established a dual-operon system capable of either faithful or mutagenic replication depending on induction conditions.

Cycle 5 — Replication Test

Cells were induced with IPTG to activate Operon 1.

Replication efficiency was assessed through kanamycin and ampicillin resistance as well as DNA retention after 48 hours of subculture.

The linear O-replicon was stably maintained with no significant loss, confirming that the three-gene module was sufficient for orthogonal DNA replication in E. coli.

Cycle 5 — Replication Learn

This test demonstrated that a minimal set of TP, O-DNAP, and SSB can sustain replication independently of the host genome.

The removal of DSB components reduced genetic load while preserving replication stability, providing a solid foundation for subsequent mutation control experiments.

Current Iteration:

Lab: WL | Cycle: 5 | Iteration: 1 ▼

References

1. Tian, R., Rehm, F. B. H., Czernecki, D., Gu, Y., Zürcher, J. F., Liu, K. C., & Chin, J. W. (2024). Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science (New York, N.Y.), 383(6681), 421–426. https://doi.org/10.1126/science.adk1281

Design

Build

Test

Learn

Design Phase (Shared with Cycle 5)

Both cycles were based on the same minimal orthogonal replication system inspired by Tian et al. (2024).

These genes were placed under two inducible promoters to form a dual-operon module:

Operon 1: P_tac → TP → O-DNAP → SSB → terminator (IPTG-inducible, high-fidelity replication)
Operon 2: P_rhaBAD → TP → O-DNAP (N71D) → SSB → terminator (rhamnose-inducible, mutagenic replication)

This shared design allowed us to evaluate two key aspects of the same system: replication stability (Cycle 4) and mutation tunability (Cycle 5).

Build Phase (Shared with Cycle 4)

The EcoRep plasmid carried both Operon 1 and Operon 2, allowing independent control by IPTG and rhamnose.

Co-transformation into E. coli established a dual-operon system capable of either faithful or mutagenic replication depending on induction conditions.

Cycle 6 — Mutation Test

To evaluate mutation control, samples were collected at 0 h, 24 h, 36 h, and 48 h after rhamnose induction to activate Operon 2.

The SpyCatcher region was sequenced to measure mutation accumulation over time.

Results suggested that mutation accumulation should show a positive correlation with induction time, but due to the low overall mutation rate or the relatively short induction period, the trend was not clearly distinguishable within the observed timeframe.

Cycle 6 — Mutation Learn

These findings confirmed that mutation strength can be finely tuned by adjusting rhamnose induction duration without compromising system stability.

The dual-operon architecture provides a reliable framework for controlled, gradual mutation — a crucial step toward continuous directed evolution in E. coli.

Current Iteration:

Lab: WL | Cycle: 6 | Iteration: 1 ▼

References

1. Tian, R., Rehm, F. B. H., Czernecki, D., Gu, Y., Zürcher, J. F., Liu, K. C., & Chin, J. W. (2024). Establishing a synthetic orthogonal replication system enables accelerated evolution in E. coli. Science (New York, N.Y.), 383(6681), 421–426. https://doi.org/10.1126/science.adk1281

Dry Lab Cycle 1: In-silico Baseline Directed Evolution

We establish a minimal, reproducible in-silico directed-evolution loop to understand how association-rate candidates improve under physics-aware selection pressure, testing baseline performance and identifying scaling limits.

Design

Build

Test

Learn

Iteration 1 — Design Phase

We design a minimal, reproducible in-silico directed-evolution loop to establish ground truth about how quickly association-rate candidates improve under a physics-aware selection pressure. The loop samples sequence variants around a protected scaffold (SpyCatcher family), applies conservative edit masks to avoid backbone-breaking changes, and relies on a standardized virtual assay: relaxed structures are generated and ranked with a diffusion-encounter proxy so that we can observe monotonic lifts in predicted k_on.

The primary design choice is to keep proposal complexity low—few edits per variant, narrow priors—so we can attribute early improvements to the physics labeler rather than to a complex, possibly biased generator. We pre-define metrics (best-of-round, median, diversity radii) and convergence criteria to compare subsequent scale-ups.

Iteration 1 — Build Phase

We implement the loop as a clean pipeline with three contract-stable stages: propose, label, and select. The proposed stage mutates only whitelisted sites and writes out both lineage and mask metadata per variant. The label stage runs structure relaxation and encounter-rate estimation in a batched, fault-tolerant way with caching and fingerprint-based deduplication, ensuring identical sequences are never re-labeled.

The select stage maintains a Pareto frontier on predicted k_on and plausibility proxies (e.g., steric clashes, stability alerts). CI checks verify the schema, and each round emits a signed report containing top sequences, parameters, and seeds. The entire loop is parameterized with environment variables so it can be reproduced across machines without code edits.

Iteration 1 — Test Phase

We execute ~200 batches across several random seeds and track the trajectory of best-of-round k_on, the spread of the candidate set, and the number of unique hotspots emerging in the winner cohort. Convergence is rapid: early rounds consistently surface families of edits at a handful of sensitive positions, while overall diversity decreases modestly as selection pressure intensifies.

Stability red flags remain rare due to strict masks, and replicate runs produce similar improvement curves, indicating our labeler is consistent. We record compute usage per stage, wall-clock times, and failure modes (e.g., rare relaxation timeouts). We also export per-site mutation frequencies to support downstream model analysis.

Iteration 1 — Learn Phase

The baseline shows that a physics-guided loop can lift predicted k_on quickly with minimal engineering, but it also reveals early signs of mode-seeking: winners concentrate around recurring motifs, and diversity would likely collapse if we continued without intervention. The strict masks protect fold plausibility yet constrain exploration, and our generator's priors gently reinforce local optima.

We conclude that scaling only the number of proposals may yield diminishing returns unless coupled with better proposal priors or representation learning that understands which edits matter. Action items: expand batch settings to test scaling laws, and start building a sequence/structure model that can explain mutation patterns and guide proposals away from congestion points.

Iteration 2 — Design Phase

We design a scale-up experiment to stress the same loop under heavier exploration while keeping all other variables constant. The goal is to map how best-k_on, median performance, and novelty respond to increased batch size, and to quantify whether the loop merely converges faster or actually finds qualitatively different solutions.

We also pre-register an analysis plan: compare per-site mutation histograms, cluster winners by Hamming distance, and examine whether new hotspots appear beyond those seen in the baseline. The design explicitly avoids introducing new priors so we can attribute differences purely to scale, not to algorithmic changes or selection criteria shifts.

Iteration 2 — Build Phase

We add batch-size presets (300–600) and enable multi-GPU job arrays for the labeling stage with robust checkpointing. To keep bookkeeping reliable at this scale, we extend the lineage tracker to capture parent–child relationships over multiple rounds and implement a variant hash that binds sequence, mask, and seed.

A monitoring dashboard reports acceptance rates, labeling throughput, and per-round diversity metrics. We also add a sampling stratifier to guarantee coverage across distinct mask slices so that scaling does not accidentally oversample the easiest regions. The output format remains identical to Iteration 1 for apples-to-apples comparisons in the analysis notebook.

Iteration 2 — Test Phase

Running ~500 batches reliably increases best-of-round k_on and accelerates early improvement; however, the curve still flattens, and the winners continue to cluster near previously identified motifs. Diversity heatmaps show broader exploration during the first few rounds but gradual re-concentration as selection iterates.

Compute scales close to linearly with batch size; labeling throughput is now the dominant cost. Stability proxies remain acceptable, and replicate runs yield similar plateaus, confirming that scale alone does not produce fundamentally new optima. We also note that median performance lifts less than the tail, implying the loop is harvesting benefits mainly from rare proposals rather than shifting the center of mass.

Iteration 2 — Learn Phase

Scaling exposes diminishing returns in a pure generate-and-keep regime. The loop converges—usefully—but exploration pressure is not well targeted, and priors plus masks keep us orbiting familiar neighborhoods. The take-home message is that we need representation learning to (1) explain observed hotspots mechanistically, (2) predict which edits are promising before paying labeling costs, and (3) shape the proposal distribution actively rather than passively hoping scale finds breakthroughs.

This motivates the next cycle: build deep models that encode sequence and structure, learn a meaningful latent space correlated with k_on, and support controlled exploration (spread, temperature) to maintain diversity without drifting off-manifold.

Current Iteration:

Lab: DL | Cycle: 1 | Iteration: 1 ▼

Dry Lab Cycle 2: Deep Learning Structure

We build and refine a Transformer-based deep learning model to learn sequence and structure relationships, progressively integrating ESM embeddings and graph neural networks to create a multimodal encoder capable of predicting performance metrics.

Design

Build

Test

Learn

Iteration 1 — Design Phase

We design a self-built Transformer encoder–decoder to reconstruct the original sequence end-to-end, using 2,210 curated examples to teach token statistics, long-range dependencies, and grammar. The hypothesis is that a strong autoregressive decoder will discover constraints implicit in successful variants and that its latent space can later be aligned with k_on.

We adopt mask-free full-sequence training initially to maximize capacity and keep objectives simple (cross-entropy only). We plan to analyze attention maps, entropy per position, and latent variance to detect shortcut learning. The design also includes a calibration check: we probe whether latent distances correlate with mutational distances or crude performance proxies.

Iteration 1 — Build Phase

We implement a moderate-depth Transformer with GELU activations, pre-norm residual blocks, and tied input/output embeddings; training uses teacher forcing with cosine LR decay and label smoothing. Data augmentation is intentionally conservative—no random deletions or swaps—to keep the task faithful to real sequences.

We log token accuracy, perplexity, and latent statistics, and we export attention heads for qualitative inspection. The codebase is modular so we can later swap embeddings or decoders without retraining everything from scratch. Early-stopping guards against overfitting, and a reproducible seed controls shuffling so diagnostic runs can be replayed exactly.

Iteration 1 — Test Phase

Token accuracy climbs quickly, reconstruction loss falls, and attention appears sharp over conserved motifs; however, latent variance collapses, and probing reveals that the model leans heavily on frequent, repeated positions, reconstructing them with near-deterministic confidence while ignoring informative but rarer sites.

When we attempt to regress a simple k_on proxy from the latent, correlation is weak, confirming that the representation has learned grammar but little about performance. Ablation studies removing frequent positions cause a disproportionate loss in accuracy, indicating shortcut reliance. The model behaves well as a language model but not as a performance-aware encoder.

Iteration 1 — Learn Phase

Full-sequence reconstruction encourages degenerate solutions: the network can minimize loss by predicting conserved tokens and paying little attention to variable, functionally relevant residues. This explains the latent collapse and the poor alignment with k_on.

We learn that we must refocus the objective on where edits occur to force the encoder to carry functionally discriminative information. This leads to a new plan: keep the Transformer backbone but decode only mutational segments under a fixed mask ("mut-only" decoding), so the model cannot ignore the challenging parts. We also flag the need for richer priors (pretrained embeddings) to inject evolutionary context.

Iteration 2 — Design Phase

We redesign the supervision: the input remains the full original sequence, but the decoder predicts only three predefined mutational segments while the rest is carried through via the mask. The objective thus concentrates capacity on the edited regions that drive function, while the encoder must aggregate global context to make accurate local predictions.

We choose balanced loss weights across segments to avoid the model "camping" on the easiest region. The design anticipates better latent utilization, improved sensitivity to functional sites, and reduced collapse, and it prepares the architecture for later performance-head attachment and latent-space optimization.

Iteration 2 — Build Phase

We implement a mask-aware loss that zeros out gradients on conserved positions and scales errors per segment. Positional encodings and residual paths are unchanged, minimizing code diff from Iteration 1. We add hooks to export segment-wise accuracies and confusion matrices and enable temperature-controlled decoding for downstream design experiments.

Training uses the same scheduler but with slightly stronger dropout to offset the narrower target. We also add a probing head that predicts whether a token is "likely beneficial" under observed mutation statistics, purely for diagnostics, not for selection. Checkpoints store mask metadata so reconstructed sequences are fully reproducible.

Iteration 2 — Test Phase

Latent collapse is mitigated: variance stabilizes, and the model becomes more responsive to edits within the masked regions. However, token accuracy plateaus near ~0.546 across segments, and generalization remains limited on held-out variants that combine rarely co-occurring edits.

Probing shows the encoder still lacks deep biophysical context, and attention drifts to local patterns without robust long-range integration. When we attempt to regress k_on from the latent, correlations are inconsistent across folds. Qualitatively, the model "knows where to look" but not enough about why particular constellations of residues improve encounter rates.

Iteration 2 — Learn Phase

Mut-only decoding proves the right structural bias—capacity is spent on edits—but a self-trained embedding on a modest dataset cannot capture evolutionary and structural subtleties that modulate k_on. We therefore plan to inject strong prior knowledge: frozen pretrained sequence embeddings that encode conservation, contact propensities, and higher-order syntax.

This should raise the quality of the encoder's features while keeping the same decoder and mask objective. The expectation is faster training, higher token accuracy, and a latent that correlates better with performance when paired with a simple head.

Iteration 3 — Design Phase

We replace learned token embeddings with frozen ESM-650M features, projected into the model dimension. The design hypothesis is that rich, pretrained sequence representations will supply contextual cues (e.g., residue preferences, evolutionary couplings) that our dataset cannot teach from scratch, thereby accelerating learning and improving generalization.

We preserve the mut-only decoder to keep pressure on edited regions and plan to evaluate not only token accuracy but also the alignment of the latent with surrogate k_on labels from the in-silico assay. Secondary analyses include attention–contact overlap and calibration curves to check whether the model's confidence tracks actual reconstruction fidelity.

Iteration 3 — Build Phase

We add an embedding loader that maps sequences to ESM features and a light projection MLP with layer-norm. The rest of the stack is unchanged for a controlled comparison. Dropout is tuned down slightly thanks to the stability of pretrained features, and we introduce a small weight decay to prevent the projection from overfitting.

Training scripts now cache ESM tensors on disk to avoid recomputation. We keep the diagnostic probes from Iteration 2 and add a simple performance head (two-layer MLP) trained to predict k_on on labeled variants, solely for correlation analysis, not for selection in this iteration.

Iteration 3 — Test Phase

Training stabilizes and accelerates: token accuracy climbs to ~75%, and the model generalizes better to held-out combinations of edits. The latent learned by the encoder shows a consistent, though moderate, correlation to in-silico k_on (R² ~0.65), indicating the representation is now capturing cues relevant to encounter kinetics.

Nevertheless, errors concentrate in cases where long-range structural rearrangements matter, and attention heads alone cannot encode 3D context. The performance head's calibration is imperfect at the extremes of k_on, reinforcing that sequence-only signals under-specify the physics. These observations motivate integrating structural information explicitly.

Iteration 3 — Learn Phase

Pretrained language knowledge is a strong boost but not sufficient for our target metric. Association rates depend on charge layouts, sterics, and solvent-exposed pathways that are inherently geometric. We therefore commit to a multimodal approach: complement ESM features with residue-graph embeddings so the encoder "sees" spatial neighborhoods, not just tokens.

Our first attempt will be end-to-end fusion with a GNN encoder concatenated to the Transformer, acknowledging the training might be brittle. If instability appears, we will decouple training by precomputing graph embeddings and freezing them before fusion.

Iteration 4 — Design Phase

We design a multimodal encoder that concatenates a GINEConv-based residue graph embedding with the Transformer's sequence representation. Nodes correspond to residues; edges connect k-nearest neighbors with invariant geometric features (distances, orientations). The hypothesis is that explicit local-neighborhood encoding helps the model represent electrostatics/sterics relevant to diffusion-limited association.

We keep the mut-only decoder unchanged. The design emphasizes tight coupling so gradients flow from the mutational prediction task back into both modalities, potentially aligning them toward performance-relevant signals without a separate objective.

Iteration 4 — Build Phase

We implement a shallow GINEConv stack with Set2Set readout feeding into a fusion MLP, then concatenate with the Transformer encoder output. Batch construction now includes graph tensors alongside sequences, with caching to avoid repeated featurization.

The training loop remains end-to-end with a combined loss, and we add gradient-norm clipping plus a warmup schedule to stabilize updates. Logging captures per-modality contribution via ablations (dropping graph or sequence) during validation, and we monitor training curvature for signs of interference between modalities.

Iteration 4 — Test Phase

Training is unstable. Early in optimization, the network over-relies on noisy graph cues, hurting token predictions and slowing convergence; later, the Transformer attempts to compensate, but the two branches oscillate rather than cooperate. Validation losses fluctuate, and the performance head's correlation to k_on does not improve reliably over the ESM-only baseline.

Gradient diagnostics show large, misaligned updates between modalities. While occasional runs show promise, reproducibility across seeds is poor. These results suggest that forcing joint learning without mature graph features is counterproductive at our data scale.

Iteration 4 — Learn Phase

We learn that decoupling is necessary: pretrain the graph pathway to produce stable, informative embeddings, then freeze it before fusion. The end-to-end dream is elegant but brittle here because the mut-only objective is not strong enough to teach geometry from scratch.

We therefore pivot to a two-stage design—compute graph embeddings offline (still GINEConv-based with Set2Set), validate their sanity independently, and then inject them as fixed features into a simple MLP before concatenating with the Transformer latent. This should stabilize optimization, preserve the benefits of 3D context, and allow the decoder to focus on accurate, mask-constrained sequence edits.

Iteration 5 — Design Phase

We design a modular multimodal encoder: frozen ESM sequence features and frozen graph embeddings feed into lightweight projection heads and are concatenated in latent space. A small performance head predicts k_on, providing a fast surrogate for active design; the mut-only decoder reconstructs edits under the established mask.

The key idea is to treat structural information as a stable side channel rather than a moving target during training. We target token accuracy >0.9 and latent→k_on R² ≥0.85 on held-out labels, along with smooth calibration across the operational range. The design also anticipates decoding controls (temperature/top-p) to be used in the next cycle.

Iteration 5 — Build Phase

We precompute residue-graph embeddings with a tuned GINEConv stack and Set2Set readout, validate their clustering on known families, then freeze and store them alongside sequences. The training graph becomes simple: [ESM_proj || Graph_proj] → latent; latent branches to (a) mut-only decoder and (b) performance head.

Regularization is mild (dropout, small L2), and training converges quickly due to the frozen backbones. We expand evaluation to calibration plots, out-of-fold correlations, and error analysis by structural neighborhood. The code paths for decoding temperature/top-p are implemented but not yet used for selection.

Iteration 5 — Test Phase

Results meet targets: token accuracy exceeds 0.9 across segments, and latent→k_on R² surpasses 0.85 on held-out, with near-linear calibration in the mid-range. Error cases cluster at the extremes of k_on and in rare conformations, but overall the multimodal latent is predictive and stable.

The performance head's ranking matches physics labels well enough to triage candidates before paying the full labeling cost. Ablations confirm both modalities contribute: removing graph embeddings reduces correlation substantially, while removing ESM reduces reconstruction fidelity. Decoding remains faithful to conserved scaffolds, honoring the mask.

Iteration 5 — Learn Phase

A fixed-embedding multimodal encoder offers the reliability we need for controlled exploration. With a predictive, calibrated surrogate and a high-fidelity decoder, we can now move to active training: search in latent space with tunable exploration/exploitation and only send shortlisted sequences to the physics labeller.

We also learn that decoding controls should be exposed as first-class levers to trade diversity for per-candidate quality as cycles progress. Finally, we adopt a rule of thumb: freeze the decoder to protect reconstruction of conserved regions, and allow modest plasticity in the encoder to reshape the latent geometry during active loops.

Current Iteration:

Lab: DL | Cycle: 2 | Iteration: 1 ▼

Dry Lab Cycle 3: Active Training

We implement a latent-space optimizer with tunable exploration parameters to enable efficient, model-guided design cycles, testing fine-tuning strategies to maintain productive search while preventing latent collapse.

Design

Build

Test

Learn

Iteration 1 — Design Phase

We design a latent-space optimizer that proposes candidates by sampling around anchors (top latent points from prior rounds) with a controllable spread parameter, KNN-covariance smoothing, and multi-start restarts. Decoding uses temperature and top-p controls to convert latent diversity into sequence diversity without leaving the manifold.

The exploration schedule is staged: early cycles use larger spread (≈3.5–4.5), knn_k 64–128, temp≈1.0, top_p≈0.9 to generate diverse but plausible edits; mid cycles reduce spread to 2.0–3.0 and tune λ_knn to 0.05; late cycles tighten spread to 1.0–2.0 and lower temp to 0.8–0.9 for higher per-candidate quality. Each proposal carries full provenance and latent diagnostics.

Iteration 1 — Build Phase

We implement the optimizer as a pluggable module: given anchors and constraints, it samples z-candidates, applies KNN jitter, and decodes sequences under the mut-only mask with specified temperature/top-p. A triage stage uses the performance head to rank proposals and select a shortlist for physics labeling.

We log per-round diversity (latent and sequence), acceptance rates, and novelty relative to prior winners. The module supports per-candidate latent sampling—so duplicates are statistically unlikely—and exposes knobs as config entries for easy A/B testing. Visualization notebooks overlay latent proposals on training manifolds to ensure exploration stays near high-density regions.

Iteration 1 — Test Phase

Across exploratory, balanced, and exploitative settings, top-k_on improves steadily while diversity decays gracefully rather than precipitously. Early rounds generate distinct families of edits, some orthogonal to prior motifs, and the surrogate ranker retains calibration when checked against physics labels on the shortlist.

As the schedule tightens, the per-candidate success rate rises and wall-clock costs drop since fewer physics evaluations are needed for similar uplift. Failure modes include occasional off-manifold decodes at extreme spread; these are automatically filtered by plausibility checks. Overall, the schedule enables us to match exploration pressure to the remaining headroom in the landscape.

Iteration 1 — Learn Phase

A tunable schedule prevents the classic pitfalls of active loops: premature collapse and wasteful wandering. We learn that per-candidate latent sampling plus moderate top_p keeps proposals non-identical without sacrificing quality, and that KNN-based jitter is a simple, effective way to preserve local geometry.

The surrogate's ranking power is sufficient for triage, but we also observe drift risks if we continually fine-tune the model on its own proposed data. This insight sets up the next iteration: evaluate fine-tuning policies that reshape the encoder just enough to avoid local pockets, while protecting decoder fidelity and head calibration.

Iteration 2 — Design Phase

We design a controlled study of fine-tuning strategies during active training: (i) freeze both encoders and only update the performance head, (ii) freeze the decoder but let encoders adapt, (iii) freeze encoders and decoder while tuning the head, and (iv) add a light contrastive regularizer on encoder layers to "repel" last-round centroids and keep the latent landscape from collapsing.

The goal is not to chase marginal surrogate accuracy, but to maintain a healthy geometry where local search remains productive over multiple rounds. Metrics include continuous-improvement streak length, best-of-round uplift, calibration stability, and diversity at fixed shortlist sizes.

Iteration 2 — Build Phase

We implement a policy switcher that applies the chosen freeze map and loss. The contrastive branch samples anchors from last-round winners and draws negatives from near-misses, applying a temperature-scaled InfoNCE term on normalized encoder outputs. The decoder remains mostly frozen across all policies to protect reconstruction in conserved regions.

We monitor head calibration with reliability diagrams and keep a rollback checkpoint to avoid drift. The training budget per round is capped to preserve loop cadence, and we log how many optimizer steps are required to recover prior calibration if it slips.

Iteration 2 — Test Phase

Head-only updates stabilize calibration but shorten continuous-improvement streaks as the latent geometry ossifies. Fully adapting encoders improves briefly but risks drift and overfitting to self-proposed data, harming generalization. The contrastive-pushed encoder policy yields the best balance: it subtly reshapes neighborhoods around anchors, opening new micro-basins while maintaining head calibration and decoder fidelity.

We see more consecutive rounds with uplift at similar shortlist sizes, and diversity drops more slowly across cycles. Rollbacks are rarely needed, and evaluation on held-out historical variants shows no degradation, confirming that modifications remain local and controlled.

Iteration 2 — Learn Phase

The optimal policy lightly pushes the encoder to keep the landscape plastic without destabilizing decoding or the head. This preserves a productive search regime across rounds and mitigates latent collapse, especially when paired with the exploration→exploitation schedule.

We formalize a default: freeze the decoder, apply contrastive nudges to encoders with small learning rates, and refresh the head only as needed to retain calibration. With this, the loop delivers consistent best-k_on gains with fewer physics evaluations, completing our DBTL arc from baseline exploration to intelligent, model-guided, and compute-efficient optimization suitable for iGEM-style documentation and future reproducible runs.

Current Iteration:

Lab: DL | Cycle: 3 | Iteration: 1 ▼

Human Practice Cycle 1: Aligning Design with Societal Priorities

In this cycle, we investigated how the Taiwanese public prioritizes Cost, Safety, and Efficacy across three familiar health contexts — Dietary Supplements, Vaccines, and Drugs — to align our SpyTag/SpyCatcher product's value communication with public expectations.

Design

Build

Test

Learn

Design Phase

In the design phase, our goal was to understand how the Taiwanese public prioritizes Cost, Safety, and Efficacy across three familiar health contexts — Dietary Supplements, Vaccines, and Drugs.

We hypothesized that perceptions of product reliability depend on how people balance these three factors, and that identifying this pattern would help us align our SpyTag/SpyCatcher product's value communication with public expectations.

To achieve this, we designed a structured public survey to investigate real-world decision logic and translate it into actionable guidance for biosensor positioning and risk communication.

Build Phase

We created an anonymous Google Form targeting Taiwan residents, open to participants of all ages and backgrounds.

The survey used a 1–10 Likert scale (1 = Not important at all, 10 = Extremely important) for each factor across three use cases.

We gathered 543 valid responses, ensuring anonymity and purpose limitation under a GDPR-inspired framework.

Data visualization was pre-planned using grouped bar plots (mean ± SD) and pie charts to compare factor weights per category.

Test Phase

Results revealed consistent decision-making patterns:

For Vaccines and Drugs, participants prioritized Safety (≈55–60%) and Efficacy (≈30–40%), while Cost ranked lowest — consistent with high-stakes medical contexts.
For Dietary Supplements, Cost (≈40%) and Safety (≈49%) dominated, while Efficacy mattered less, reflecting expectations for everyday, non-therapeutic use.

These findings confirmed that trust (Safety) and accessibility (Cost) define public confidence, and that communication strategies must adapt to context — emphasizing clinical evidence for drugs/vaccines and transparent quality for supplements.

Learn Phase

From this survey, we learned that Safety and Efficacy serve as trust anchors for medical products, while Cost acts as a gatekeeper for supplements.

These insights now guide our HP-engineering loop:

For Vaccines/Drugs → prioritize AE monitoring, risk management, and clinical consistency.
For Supplements → highlight transparent production, pricing clarity, and GRAS validation.

By integrating these lessons into our SpyTag/SpyCatcher platform documentation and communication, we ensure that our project design evolves in sync with societal expectations, accessibility needs, and public trust.

Current Iteration:

Lab: HP | Cycle: 1 | Iteration: 1 ▼

Human Practice Cycle 2: Prototyping an Interactive Education Model

We co-designed the "Synthetic Biology × AI Cross-Disciplinary Innovation Camp" in collaboration with the National Taiwan Science Education Center (NTSEC), creating an interactive program that made synthetic biology approachable to junior and senior high school students through hands-on learning and peer mentorship.

Design

Build

Test

Learn

Design Phase

We began by co-designing the "Synthetic Biology × AI Cross-Disciplinary Innovation Camp" in collaboration with the National Taiwan Science Education Center (NTSEC).

Our goal was to create a program that made synthetic biology approachable to junior and senior high school students through interactive and interdisciplinary learning.

To make the camp more relatable, we developed a "students-teaching-students" model, inviting our high-school interns to assist in lesson design and teaching.

This framework encouraged peer-level communication, helping younger students feel comfortable asking questions while allowing our interns to strengthen their confidence and presentation skills.

The camp design combined hands-on Wet Lab modules (bacterial staining and microscopy) with Dry Lab modeling sessions (MATLAB and AI-based learning), reflecting iGEM's spirit of merging biology and computation.

Build Phase

To implement the plan, we built a two-day interactive curriculum integrating creativity, experimentation, and digital learning.

We curated educational materials such as:

A children's picture book and custom-designed board game to introduce synthetic biology concepts playfully.
MATLAB exercises for AI modeling and data visualization.
Wet Lab activities like microbial observation and Gram staining for real lab experience.

Our interns rehearsed lesson delivery, refined timing, and prepared bilingual slides and handouts — including "Why iGEM Needs Dry Lab: Applications of AI in Synthetic Biology" and "2025 NTSEC Summer Camp Handouts."

By combining visual storytelling, computational modeling, and real experiments, we built an immersive educational platform where theory and practice complemented each other.

Test Phase

During the two-day camp, we observed how participants interacted with the materials and responded to the interdisciplinary lessons.

Students showed high engagement during both storytelling and experiment sessions, especially when they could visualize biological processes through the board game or connect them to AI applications.

Feedback revealed that the cross-disciplinary structure made science less intimidating — many students expressed surprise and excitement at seeing how biology and coding could intersect.

This hands-on testing phase also helped us identify challenges: timing adjustments were needed between Wet Lab and AI sessions, and we learned the importance of pacing for mixed-age audiences.

Learn Phase

From this experience, we learned that education becomes most powerful when it is interactive, relatable, and student-led.

Our interns gained firsthand teaching experience, evolving from learners into mentors — an embodiment of iGEM's "learn by doing" philosophy.

Students' curiosity and creativity inspired us to further refine our outreach: simplifying explanations, using more visual materials, and designing gamified learning tools for future camps.

This reflection phase reinforced our belief that Human Practices can itself be engineered — through feedback loops between teaching, iteration, and community response.

Looking forward, we plan to expand this model into a replicable education framework by:

Bringing our board games and storybooks to more schools and science museums.
Collaborating with other iGEM teams to co-develop multilingual toolkits.
Sharing our camp materials openly for educators worldwide.

Through this iteration, learning itself became our experiment — transforming outreach into a sustainable system of creativity, mentorship, and scientific empowerment.

Current Iteration:

Lab: HP | Cycle: 2 | Iteration: 1 ▼

Human Practice Cycle 3: Stress-Testing Outreach via Open Innovation

We designed TurBioHacks 2025, an international biohackathon co-hosted with Stanford University, NUS, and IIT-Madras, to reimagine how young people engage with real-world biotechnology challenges, transforming research problems into open invitations for creativity, collaboration, and rapid prototyping.

Design

Build

Test

Learn

Design Phase

We designed TurBioHacks 2025, an international biohackathon co-hosted with Stanford University, NUS, and IIT-Madras, to reimagine how young people engage with real-world biotechnology challenges.

Our goal was to turn "slow science into fast science" — transforming research problems into open invitations for creativity, collaboration, and rapid prototyping.

To ensure inclusivity, we targeted both high-school and early undergraduate students, emphasizing that science can be accessible and action-driven from the very first step.

We identified three core goals:

Inspiration – make biotechnology approachable;
Engagement – foster teamwork across disciplines and countries;
Accessibility – provide guided resources and mentorship to lower entry barriers

Build Phase

We structured the event as a three-stage learning journey — Preparation, Exploration, and Delivery — allowing participants to grow from guided learners to independent innovators.

1. Preparation – Building the Knowledge Base

Before the hackathon, participants attended a series of onboarding workshops and professor-led lectures introducing synthetic biology and bioinformatics fundamentals.

2. Exploration – Track-Based Collaboration

The hackathon featured six thematic tracks: Astrobiology, Biomanufacturing, Drug Discovery, Food & Nutrition, Neuroscience, and Oncology.

Our NTHU_Taiwan team led the Drug Discovery Track, designing tasks that bridged protein design and AI-based modeling.

Participants followed a stepwise progression:

Foundations: protein–ligand interactions and therapeutic strategies.
Practice: Google Colab tutorials for molecular modeling.
Research: curated reading and design prompts for creative problem-solving.

3. Delivery – From Ideas to Prototypes

Over 48 hours, teams developed and submitted their projects on Devpost, completing the cycle from idea → prototype → presentation.

The process reflected the iGEM principle that hands-on collaboration accelerates learning and innovation.

Test Phase

Once the hackathon concluded, we analyzed submissions to evaluate both creativity and scientific rigor.

The Devpost Project Gallery showcased over a hundred projects, each available publicly for open learning and inspiration.

We were particularly impressed by participants' ability to combine coding, modeling, and biology — even among first-time competitors.

Spotlight: The Yumin Team (Drug Discovery Track Winners)

Their project, mCherry–Nanobody Fusion for Rapid HER2+ Cancer Cell Labeling, exemplified the spirit of TurBioHacks.

They:

Defined a clinically relevant biomedical goal;
Modeled fusion proteins using RFdiffusion, AlphaFold3, and pLDDT validation;
Proposed feasible wet-lab testing for fluorescent labeling;
Published a fully documented GitHub repository with models, code, and sequences.

This real-world testing confirmed that with proper mentorship and structure, students can transform abstract ideas into computationally viable biodesign solutions.

Learn Phase

From TurBioHacks 2025, we learned that hackathons can serve as engineering engines for Human Practices — where education, collaboration, and innovation co-evolve.

Key takeaways:

Accessible mentorship transforms curiosity into capability.
Interdisciplinary teams accelerate creative solutions.
Open publication (Devpost Gallery) keeps innovation transparent and reproducible.
Follow-up engagement sustains learning beyond the event.

These insights inspired us to iterate forward: we plan to make TurBioHacks an annual event, expanding track themes each year to include emerging topics such as bio-AI integration, climate biosolutions, and biodesign ethics.

Our long-term vision is to establish a global, open-source learning platform — empowering students worldwide to share ideas, test hypotheses, and evolve together.

Current Iteration:

Lab: HP | Cycle: 3 | Iteration: 1 ▼

Engineering

Wet Lab Cycle 1: Mutation / Validation Environment Choice

Iteration 1 — Design Phase

Iteration 1 — Build Phase

Iteration 1 — Test Phase

Iteration 1 — Learn Phase

Iteration 2 — Design Phase

Iteration 2 — Build Phase

Iteration 2 — Test Phase

Iteration 2 — Learn Phase

Iteration 3 — Design Phase

Iteration 3 — Build Phase

Iteration 3 — Test Phase

Iteration 3 — Learn Phase

References

Wet Lab Cycle 2: Quantifying Usable Mutation Space in epPCR

Iteration 1 — Design Phase

Iteration 1 — Build Phase

Code A: JC69 essentials (calibration + transition probabilities)

Iteration 1 — Test Phase

Iteration 1 — Learn Phase

Iteration 2 — Design Phase

Iteration 2 — Build Phase

Code B: codon-level stop risk + usability definitions (uses Code A)

Iteration 2 — Test Phase

Iteration 2 — Learn Phase

Iteration 3 — Design Phase

Iteration 3 — Build Phase

Code C: grid evaluation + 10% ridge + minimal plotting/export (uses A & B)

Iteration 3 — Test Phase

Iteration 3 — Learn Phase

References

Wet Lab Cycle 3: Optimization of Luciferase–SpyCatcher/SpyTag Fusion

Iteration 1 — Design Phase

Iteration 1 — Build Phase

Iteration 1 — Test Phase

Iteration 1 — Learn Phase

References

Wet Lab Cycle 4: Attempting to Reconstruct the Original EcORep System

Design Phase

Build Phase

Test Phase

Learn Phase

References

Wet Lab Cycles 5–6: Establishing and Characterizing the Minimal Orthogonal Replication System

Design Phase (Shared with Cycle 6)

Build Phase (Shared with Cycle 6)

Cycle 5 — Replication Test

Cycle 5 — Replication Learn

References

Design Phase (Shared with Cycle 5)

Build Phase (Shared with Cycle 4)

Cycle 6 — Mutation Test

Cycle 6 — Mutation Learn

References

Dry Lab Cycle 1: In-silico Baseline Directed Evolution

Iteration 1 — Design Phase

Iteration 1 — Build Phase

Iteration 1 — Test Phase

Iteration 1 — Learn Phase

Iteration 2 — Design Phase

Iteration 2 — Build Phase

Iteration 2 — Test Phase

Iteration 2 — Learn Phase

Dry Lab Cycle 2: Deep Learning Structure

Iteration 1 — Design Phase

Iteration 1 — Build Phase

Iteration 1 — Test Phase

Iteration 1 — Learn Phase

Iteration 2 — Design Phase

Iteration 2 — Build Phase

Iteration 2 — Test Phase

Iteration 2 — Learn Phase

Iteration 3 — Design Phase

Iteration 3 — Build Phase

Iteration 3 — Test Phase

Iteration 3 — Learn Phase

Iteration 4 — Design Phase

Iteration 4 — Build Phase

Iteration 4 — Test Phase

Wet Lab Cycle 1:
Mutation / Validation Environment Choice

Wet Lab Cycle 2:
Quantifying Usable Mutation Space in epPCR

Wet Lab Cycle 3:
Optimization of Luciferase–SpyCatcher/SpyTag Fusion

Wet Lab Cycle 4:
Attempting to Reconstruct the Original EcORep System

Wet Lab Cycles 5–6:
Establishing and Characterizing the Minimal Orthogonal Replication System