Engineering Success

1. Overview

Our project aims to establish a green mosquito-repellent technology route for use in campuses and communities. It addresses the growing global and local challenges of mosquito nuisance and disease risks, as well as the shortcomings of traditional chemical repellents and fumigation products in terms of safety, odor, and environmental impact in enclosed spaces. We selected citronellal—a naturally derived, human- and animal-friendly compound—as the core active ingredient. Using synthetic biology, we constructed a sustainable and scalable biosynthetic pathway in engineered E. coli. Through statistical optimization and enzyme engineering, we transformed the "capable of production" into "high-yield production," laying the foundation for subsequent slow-release formulations and real-world applications.

From an engineering framework perspective, our project strictly adhered to the iGEM DBTL (Design–Build–Test–Learn) cycle and completed four rounds of iteration:

The first DBTL cycle focused on Chassis and Pathway Construction. We established a three-enzyme cascade system (GPS–CsTPS1–GeDH) using multi-plasmid co-expression (pET28a / pET21a / pBAD23) and dual induction with IPTG and L-arabinose, achieving stable and regulatable expression of the pathway in the BL21 background.

The second DBTL cycle emphasized Validation of Growth Capacity, Transcription, and Translation. We measured the growth curves of engineered and wild-type strains to assess changes in growth capability. At the transcriptional level, qPCR was used to quantify the fold increase in mRNA before and after induction. At the protein level, Western blot detected three specific bands of expected molecular weights.

The third DBTL cycle centered on Product Fermentation. We preliminarily determined fermentation parameters and successfully produced citronellal. We established a quantification protocol for citronellal using both HPLC and UV absorption methods, and compared intracellular and extracellular citronellal concentrations.

The fourth DBTL cycle involved Process Optimization. Using the Box–Behnken response surface methodology, we systematically investigated three factors: induction temperature, IPTG, and L-arabinose. The optimal combination, primarily involving low-temperature induction, was identified and validated in confirmation batches, providing statistical certainty for fermentation parameters. Additionally, we employed computational rational design to screen key sites in GeDH based on molecular docking and dynamics simulations, and constructed combinatorial mutants. This improved the catalytic efficiency of the terminal conversion step without altering the upstream design, leading to a further increase in yield.

Our project has successfully transitioned from "capable of producing citronellal" to "producing citronellal more stably and at higher yields." The engineered strain achieved citronellal titers approaching gram-per-liter levels in shake-flask cultures. Dry lab work provided crucial guidance for the project's success and reduced the wet lab search space, while wet lab data (qPCR, Western blot, HPLC, etc.) continuously refined models and hypotheses, driving further DBTL cycles.

Project Design Diagram
Figure 1.1 Project Design Diagram

2. DBTL Cycle 1: Chassis and Pathway Construction

2.1 Design

The goal of our first cycle was to select appropriate enzymes and a chassis cell for the biosynthesis of citronellal. We chose to reconstruct a three-enzyme cascade pathway—"geranyl pyrophosphate (GPP) → geraniol → citronellal"—in an E. coli BL21 background. The genes GPS, CsTPS1, and GeDH were cloned into pET28a, pET21a, and pBAD23 vectors, respectively, forming a compatible multi-plasmid co-expression system with distinct antibiotic resistances. This design facilitates the selection of transformants and enables dual induction and dose regulation via IPTG and L-arabinose. The strategy emphasizes modular, replaceable expression cassettes to allow for flux distribution and subsequent optimization (e.g., induction strength, temperature).

plasmid design
Fig. 2.1 Plasmid Design

2.2 Build

We first obtained the sequences of the three target genes and codon-optimized them for E. coli. The optimized sequences were synthesized by Sangon Biotech (Shanghai) and delivered as fragments cloned into pUC19 plasmids. Target fragments and backbone vectors (from the lab stock) were amplified via PCR and verified by agarose gel electrophoresis (Fig. 2.2).

2.2 Marker
Fig. 2.2 M: Marker; 1: GeDH; 2: CsTPS1; 3: GPS; 4: pET28a; 5: pET21a; 6: pBAD33

The fragments were then assembled into the vectors using Gibson assembly and transformed into BL21 cells. Successful transformants were obtained (Fig. 2.3).

bacterial transformants
Fig. 2.3 Bacterial Transformants

2.3 Test

To verify the correct construction of the engineered strain, we designed specific primer pairs for each exogenous gene (GPS, CsTPS1, GeDH) for genotypic identification. The primers targeted internal conserved regions and vector-flanking regions to enhance specificity and directional accuracy. Two independent single colonies were picked from the transformation plate and directly subjected to colony PCR. Gel electrophoresis showed amplification bands at the expected sizes for all three genes, with no non-specific bands or primer dimers (Fig. 2.4). These results provided initial evidence that all three genes were correctly inserted and oriented in the host strain, indicating successful transformation.

M:marker
Fig. 2.4 M: Marker; 1,2: GeDH; 3,4: CsTPS1; 5,6: GPS

To mitigate false positives from colony PCR (e.g., due to template contamination or homologous amplification), the two positive clones were sent to Sangon Biotech for Sanger sequencing. Forward and reverse reads covered the insertions and vector junctions. The sequences matched the designed sequences exactly, with intact reading frames, correct start/stop codons, and no frameshifts or unintended mutations. All three genes were confirmed to be correctly oriented, without deletions or inversions. This validated the accuracy of the engineering beyond PCR size matching. For consistency in subsequent experiments, the doubly verified clones were preserved as glycerol stocks at -80°C, serving as standardized starting materials for expression and fermentation tests.

2.4 Learn

E. coli's endogenous MEP pathway continuously supplies IPP/DMAPP, making it a natural metabolic chassis for monoterpene precursor synthesis. BL21(DE3) offers advantages in cost, ease of operation, and laboratory reproducibility. We chose GPP as the nodal point: GPS condenses IPP and DMAPP into GPP, CsTPS1 converts GPP to geraniol, and GeDH oxidizes geraniol to citronellal. This modular "GPS → CsTPS1 → GeDH" pathway aligns with host metabolism and allows decoupled optimization at multiple levels (expression strength, enzyme activity, induction strategy). We not only obtained an engineered strain theoretically capable of producing citronellal but also established a verifiable checklist and quality thresholds for the next cycle, ensuring that subsequent assessments of growth capacity, transcription/translation levels, and product formation are evidence-based, reproducible, and iterative.

3. DBTL Cycle 2: Validation of Growth Capacity, Transcription, and Translation

3.1 Design

Colony PCR and Sanger sequencing provided solid evidence that the "three enzymes are correctly inserted and oriented." However, the actual product flux depends on the expression levels of the three expression cassettes within the cells. Therefore, the focus of the second DBTL cycle shifted to "three-layer validation":

  1. Measuring the growth curve of the engineered strain to analyze its growth capacity.
  2. Using qPCR to quantify the fold change in transcription before and after induction.
  3. Using WB/SDS-PAGE to confirm stable expression of the three enzymes at their expected molecular weights.

The multi-plasmid dual induction system offers flexibility but also introduces potential noise from resource competition and copy number variations. To address this, we fixed the inoculation amount, induction timing, and antibiotic pressure in subsequent validations, comparing differences between the engineered strain and empty vector/wild-type controls.

Only after confirming normal transcription and translation can we proceed with process optimization and protein engineering; otherwise, we must return to fine-tuning expression balance and induction strategies.

At the transcriptional level, qPCR primers covering approximately 200 bp were designed, with 16S rRNA as the internal reference, to assess mRNA upregulation before and after induction. At the translation level, SDS-PAGE/Western Blot was used to detect the three target bands.

growth curve
Fig. 3.1 Growth curve, qPCR, and WB

3.2 Build

3.2.1 Growth Curve

Nine 250 mL conical flasks were divided into three groups, each with a 50 mL system:

The initial OD600 of all groups was adjusted to 0.2, and they were cultured in LB medium at 37°C and 220 rpm for 24 hours.

3.2.2 qPCR

Primers were designed for GPS, CsTPS1, GeDH, and the internal reference (16S rRNA) to amplify 100–200 bp fragments with a Tm of ~60°C and GC content of 40–60%. RNA extraction was performed using a Meiji kit, and reverse transcription was done with a Genstar kit. The qPCR reaction system had a final volume of 20 µL, using 2× SYBR Green Master Mix with a single primer concentration starting at 0.2 µM. The cycling program consisted of 2–3 min pre-denaturation at 95°C, followed by 40 cycles of 10–15 s at 95°C and 30 s at 60°C.

3.2.3 Western Blot

Before translation-level validation, the engineered strain stored at -80°C was revived. The revived culture was subjected to total protein lysis and concentration measurement using NanoDrop for rapid assessment. After sample standardization, SDS-PAGE separation was performed following conventional procedures: equal loading, constant voltage electrophoresis to fully separate target proteins by molecular weight. After electrophoresis, a PVDF membrane was used for wet transfer. After confirming successful transfer, immunodetection was carried out: membrane blocking to reduce non-specific binding, incubation with His-tag primary antibody, followed by HRP-labeled secondary antibody for signal amplification, and finally ECL chemiluminescence for development and recording.

3.3 Test

3.3.1 Growth Curve

To evaluate whether the "three-plasmid dual induction" design imposed a basal growth burden on the host, standardized growth curve tests were conducted under identical culture conditions and selection pressures using the engineered strain, empty vector strain, and wild-type strain as controls. All groups entered the logarithmic phase with the same initial inoculation amount, with fixed shaking speed and temperature, and OD600 was recorded at regular intervals. Means and standard deviations were obtained from at least three biological replicates. Figure 3.2 shows the overall growth trajectories of the three groups: the lag phase length, logarithmic phase slope, and final OD600 upon entering the stationary phase highly overlapped, with no abnormal fluctuations or significant tailing.

The results clearly indicate that the engineered strain's basal growth capacity in the uninduced state is comparable to that of the empty vector and wild-type strains, suggesting that multi-plasmid coexistence and selection pressure did not cause measurable growth impairment. In other words, the genetic construction did not introduce significant host burden or toxic phenotypes, and the engineered strain can proliferate at the same rate as the controls. This conclusion provides a reliable starting point for subsequent expression induction and fermentation experiments: when comparing transcription/translation levels and product formation in the next cycle, "growth defects" can be excluded as a confounding factor, and observed differences can be primarily attributed to pathway expression and catalytic efficiency itself.

group 1
Fig. 3.2 Group 1: Engineered bacteria + LB + antibiotics; Group 2: Wild-type bacteria + LB; Group 3: Empty vector bacteria + LB + antibiotics

3.3.2 qPCR

To assess whether the exogenous pathway was effectively activated at the transcriptional level after induction, qPCR analysis was performed on GPS, CsTPS1, and GeDH using 16S rRNA as the internal reference and the 2^-ΔΔCt relative quantification framework. The results showed that under simultaneous induction with IPTG (T7 system) and L-arabinose (PBAD system), the relative transcription levels of all three genes were significantly upregulated: GPS ~16.1×, CsTPS1 ~18.0×, and GeDH ~12.6× (Fig. 3.3). This evidence indicates that our dual induction design can stably and strongly drive the expression of the three enzymes at the transcriptional level.

In terms of expression patterns, the upregulation fold changes of CsTPS1 and GPS were slightly higher than that of GeDH, suggesting that the terminal oxidation step (where GeDH is located) is relatively conservative in transcriptional activation amplitude. This difference may stem from the kinetic characteristics of different promoters/regulatory elements (T7 and PBAD have different response strengths and saturation behaviors) or reflect the "upper limit" of induction concentrations set to avoid cellular burden. From an engineering perspective, this provides direction for subsequent optimization: on one hand, pathway balance can be improved by fine-tuning induction strength/timing (e.g., inducing upstream first, then GeDH); on the other hand, while maintaining mRNA quality and stability, exploring RBS strength grading or GeDH protein engineering could enhance the flux of the terminal step. In summary, the qPCR results not only verify that the "pathway is successfully activated" but also provide quantitative basis and priority clues for optimization at the translation and catalytic levels in the next round.

relative expression
Fig. 3.3 Relative expression level of GPS, CsTPS1, and GeDH

3.3.3 Western Blot

To verify the expression of the three exogenous enzymes at the translation level, Western Blot was performed on the induced strain. The results showed clear specific bands at the expected molecular weight windows: GPS 46.9 kDa, CsTPS1 63.7 kDa, and GeDH 41.6 kDa (see Fig. 3.4).

It is important to note that GPS and GeDH have similar molecular weights and both carry the same epitope tag, appearing as a slightly thicker merged band on the membrane. This phenomenon is common in multi-target membrane detection and does not affect the determination of "presence/absence of expression."

western blot
Fig. 3.4 The Western blot results of GPS, CsTPS1, and GeDH.

3.4 Learn

After comparing standardized growth curves of the engineered strain, empty vector strain, and wild-type strain under uninduced conditions, we observed that the three groups highly overlapped in lag phase duration, logarithmic phase slope, and stationary phase final OD600, showing no significant differences. Thus, it can be concluded that multi-plasmid coexistence did not impose a measurable basal growth burden, and the metabolic pressure of the engineered strain is comparable to that of the wild-type, allowing it to safely proceed to subsequent induction expression and fermentation stages without additional chassis "burden reduction" measures.

At the transcriptional level, we quantified the three exogenous genes using 16S rRNA as the internal reference. After induction, the relative transcription levels of GPS, CsTPS1, and GeDH were significantly upregulated, indicating that the plasmids can be stably transcribed within the host, providing direct evidence for subsequent protein expression and catalysis of the pathway. At the translation level, we detected the three target proteins within the expected molecular weight windows.

The three types of evidence—growth curve, qPCR, and WB—form a complete closed loop from "chassis health → transcriptional activation → protein expression": the engineered strain showed no growth impairment, and the three genes were stably and efficiently transcribed and translated after induction, marking the successful "activation" of the metabolic pathway and have the conditions to advance to the downstream product level. These results provide reliable assurance for subsequent fermentation experiments and parameter optimization, laying a solid foundation for achieving citronellal biomanufacturing.

4. DBTL Cycle 3: Product Fermentation

4.1 Design

The goal of this cycle was to use fermentation + analytical chemistry to prove that the metabolic pathway is "product-visible" and to establish a reproducible quantification workflow. The fermentation strategy used TB medium as the substrate, with simultaneous induction by IPTG + L-arabinose during the stable logarithmic phase. The analytical strategy used HPLC as the main method to establish an external standard curve for citronellal, while UV-Vis was used for rapid verification to achieve "dual-channel consistency" and enhance data credibility. To determine product distribution, the fermentation supernatant and cell lysate were quantified separately, and significance analysis was conducted to determine whether it is feasible to only test supernatant in the future.

4.2 Build

At the process level, after revival and activation in LB, the culture was scaled up in TB. Induction expression was initiated with IPTG and L-arabinose at OD600 ≈ 0.6. TB medium was chosen for three reasons: glycerol metabolism alleviates catabolite repression and reduces acetate accumulation; high nitrogen and buffering systems support late-stage activity and steady-state; our previous growth curves indicated that the engineered strain's metabolic burden was close to that of the wild-type, making it suitable for scale-up. During sample collection, both supernatant and lysate were taken for subsequent detection.

At the analytical level, citronellal standards at 0.4, 0.7, 1.0, 1.3, and 1.6 g/L were prepared according to the external standard method to establish an HPLC standard curve; quantification was done using a UV detector for peak integration. To enhance evidence strength, UV-Vis was performed in parallel. Additionally, both cell lysate and fermentation supernatant were quantified to analyze whether there was a statistical difference between them.

4.3 Test

The standard curve for citronellal concentration is shown in Fig. 4.1. The retention time for citronellal was approximately 14 minutes.

standard curve
Fig. 4.1 The standard curve of the citronellal sample.

Next, we used HPLC to detect the citronellal concentration in the TB medium fermentation supernatant (Fig. 4.2). The analysis showed a citronellal concentration of 0.89 g/L, successfully achieving citronellal production through fermentation.

hplc
Fig. 4.2 HPLC of TB fermentation supernatant.

Furthermore, considering that citronellal might also exist intracellularly, we also detected the citronellal concentration in the cell lysate (Fig. 4.3). The analysis showed a citronellal concentration of 0.91 g/L.

HPLC4.3
Fig. 4.3 HPLC of cell lysate.

Significance analysis revealed no significant difference between the two (Fig. 4.4). Therefore, subsequent experiments directly used the fermentation supernatant for detection.

significance
Fig. 4.4 Significance Analysis of Concentration Differences between Cell Lysate and Fermentation Supernatant.

Although liquid chromatography is generally highly accurate, to make our results more credible, we performed UV-Vis in parallel on the samples. The results showed little difference, confirming the quantification conclusion from a second technical path.

4.4 Learn

This round yielded three clear conclusions and next steps:

  1. The metabolic pathway has been confirmed at the "product level," with the engineered strain capable of producing citronellal close to g/L levels.
  2. The intra-/extracellular product distribution is similar, so "direct supernatant measurement" can represent the overall level, significantly simplifying subsequent batch screening.
  3. The consistency between HPLC and UV-Vis provides a robust quantification foundation for subsequent parameter optimization and comparative experiments.

Based on these conclusions, we will expand the factors in the next round to include induction temperature, IPTG, and L-arabinose levels, using response surface methodology for statistical optimization. In the meantime, dry lab rational protein design will be combined to achieve higher yields.

5. DBTL Cycle 4: Process Optimization

5.1 Design

The goal of this cycle was to transform citronellal production from "capable" to "high-yield." On the process side, we used the Box-Behnken three-factor three-level design of response surface methodology (RSM), with citronellal concentration as the response, to systematically investigate induction temperature, IPTG concentration, L-arabinose concentration, and their interactions and quadratic effects, avoiding the pitfalls of single-factor methods that ignore synergy and nonlinearity. This design has evenly distributed points and avoids extreme conditions, allowing 15 experiments to fit a model including main effects, interactions, and quadratic terms.

In parallel with "dry lab" work, rational design was focused on the pathway's rate-limiting step, GeDH (Fig. 5.1): using Chai1 to predict protein structure, Funclib to predict mutation networks and provide thermostability scores, and Rosetta for molecular docking, with GROMACS molecular dynamics to assess complex stability and calculate binding free energy, we filtered out superior sequences for wet lab validation.

dry lab
Fig. 5.1 Dry lab roadmap.

5.2 Build

According to the Box-Behnken design (Table 5.1), 15 shake flask fermentations were completed, covering combinations of 20/28/36°C, 0.10/0.50/0.90 mM IPTG, and 0.02/0.11/0.20% (w/v) L-arabinose. The corresponding citronellal concentration data matrix was obtained experimentally.

Table 5.1. Variables and levels in central composite design
Factors Levels
-1 0 1
Temperature(℃) 20 28 36
IPTG(mM) 0.10 0.50 0.90
l-Arabinose (% w/v) 0.020 0.110 0.200

ANOVA was performed on the results: the overall model was significant (F=8.90, p=0.0134), with temperature A as the most significant main effect (F=42.18, p=0.0013); the IPTG×L-arabinose (BC) interaction was significant (p=0.0422), indicating that dual induction requires setting; the quadratic term of IPTG (B²) was highly significant (p=0.0069), indicating an optimal concentration range; lack of fit was not significant (p=0.0868), indicating reliable model fitting. The model provided the optimal conditions: 20°C, IPTG 0.46 mM, L-arabinose 0.20% (w/v), as shown in Table 5.2.

Table 5.2. Box-Behnken Response Surface Experimental Design and Results
Std Run A:Temperature B:IPTG C:l-Arabinose Citronellal(g/L)
121280.90.20.68
42360.90.110.77
53200.50.020.82
154280.50.110.84
35200.90.110.90
106280.90.020.88
137280.50.110.90
98280.10.020.54
119280.10.20.81
810360.50.20.60
611360.50.020.65
212360.10.110.43
1413280.50.110.91
114200.10.110.84
715200.50.20.96

Simultaneously, based on dry lab work, protein stability was scored (Fig. 5.2).

protein
Fig. 5.2 Protein stability score distribution.

Docking results were also scored (Fig. 5.3).

docking
Fig. 5.3 Docking score distribution.

We found that quadruple mutants generally scored better than triple and double mutants. Based on comprehensive results, four candidate proteins and the wild-type were selected for molecular dynamics simulation (Table 5.3).

Table 5.3 Protein stability and molecular docking scoring results
GeDH Protein_score Dock_score Score
64117-1187.136-1344.596-1281.6121
63928-1182.999-1344.791-1280.0742
54883-1184.333-1343.538-1279.8561
54456-1186.282-1342.141-1279.7974
WT-1161.051-863.977-982.8066

The dominant conformations during dynamics simulation are shown in Fig. 5.4. The ligand is shown in blue. In the figure 54883, blue represents the wild-type, and dark purple represents the mutant stick representation, showing good fit. 54456 also binds at the correct position with a high binding free energy.

dominant
Fig. 5.4 Dominant conformations during dynamics simulation.

Based on dry lab analysis of molecular docking/dynamics, combination mutation was performed on GeDH at sites Q60R, L125P, F147Y, and A300D, aiming to enhance affinity and catalytic efficiency for the intermediate (citronellol/related aldehyde alcohol state). To isolate variable effects, only the GeDH expression cassette was replaced while keeping the chassis and upstream enzymes unchanged (details see model section).

5.3 Test

Fermentation was conducted under the optimal conditions predicted by the response surface. HPLC quantification of the supernatant showed a citronellal concentration of 1.01 g/L (Fig. 5.5).

55hplc
Fig. 5.5 HPLC of fermentation under response surface-predicted optimal conditions.

Under the same conditions, rapid quantification of the GeDH mutant supernatant showed a concentration of 1.36 g/L, indicating that the rationally designed variant significantly increased flux at the terminal oxidation step (Fig. 5.6).

56hplc
Fig. 5.6 HPLC of mutant fermentation under optimal conditions.

5.4 Learn

This round yielded three actionable engineering insights:

  1. Lowering the induction temperature is the primary lever for increasing yield.
  2. Dual induction requires coordination, and IPTG has an optimal window—higher doses do not always yield benefits.
  3. Enzymatic reconstruction of the terminal step is effective, and together with the process optimization can yield significant gains.

We solidified "20°C, IPTG 0.46 mM, L-arabinose 0.20%" as the current standard induction conditions; confirmed the quantification strategy of "direct supernatant measurement + HPLC external standard as the main method, UV as the auxiliary method; and extended the GeDH mutation strategy to saturation mutagenesis/combination mutagenesis of upstream enzymes. We plan to include dissolved oxygen, pH, and induction timing in the next round of RSM factor sets during scale-up trials to further statistically improve volumetric yield and batch-to-batch consistency. These improvements collectively provide both process and molecular handles for further reducing production costs and advancing toward pilot-scale amplification.