Model: Project Background & Overview

Against the backdrop of global plastic pollution and “dual-carbon” goals, building a closed loop of “plastic depolymerization → biomanufacturing” is becoming a key direction in synthetic biology. Ethylene glycol (EG), released during biological/enzymatic depolymerization of PET, is both an accessible non-sugar carbon source and a potential environmental risk. How to efficiently channel EG into central metabolism and further convert it into value-added products is the core scientific and engineering challenge for plastic resource utilization.

We chose the engineering-friendly E. coli BL21(DE3) as the chassis, and introduced the well-annotated and extensively studied E. coli K-12 MG1655 as the genomic and metabolic reference. Importantly, prior studies have shown that wild-type MG1655 can effectively metabolize EG by heterologous or high-level overexpression of key enzymes such as gcl and hyi, providing a transferable strategy baseline for evaluating BL21(DE3)’s potential and charting engineering routes.

However, existing genome-scale metabolic models (GSMMs) of BL21(DE3) generally lack a complete representation of “EG → acetyl-CoA → target polymer/product,” making it difficult to pinpoint rate-limiting steps and engineering targets at the flux level. Accordingly, we posed and answered two complementary questions regarding EG usability and utilization efficiency:

A) Does BL21(DE3) have the potential to metabolize EG?
B) After completing the pathway, how does BL21(DE3) metabolize EG, and which steps are critical?

As shown in Figure 1A, we first performed orthology clustering, phylogeny, and gene synteny analyses between BL21(DE3) and MG1655, confirming that BL21(DE3) largely retains EG-related pathway genes at the coding level, with good structural synteny and conservation. Combined with experimental facts from MG1655 (EG utilization via overexpressing gcl, hyi, etc.), this provides both structural and empirical support that BL21(DE3) can metabolize EG—see Project 1 for details.

As shown in Figure 1B, we supplemented the full EG pathway on top of mainstream BL21(DE3) models to form a computable network covering “EG → acetyl-CoA → PEAs (polyester amides).” Using complementary methods including FBA, dFBA, FVA, and FDCA, we analyzed flux distribution and robustness under two objectives (“growth” vs. “product”), identified engineering-sensitive nodes such as gcl, glxR/garR, and glxK, and proposed a more compact overexpression hypothesis to guide wet-lab design with quantitative evidence—see Project 2 for details.

In sum, we built a clear bridge between the macro problem (PET circularity) and a practical technical route: comparative genomics establishes that BL21(DE3) can (structural/evolutionary support for chassis choice and pathway presence), while model reconstruction and flux analysis delineate what to do and why (data-driven bottleneck localization and control strategies). This integrated “evidence chain → computation → engineering” approach is poised to convert EG from PET depolymerization into precursors and target products for green biomanufacturing in the iGEM setting.

**Project 1: Comparative-Genomics Assessment of EG Metabolic Potential in E. coli BL21(DE3)**

1. Research Objective

Using MG1655 as the benchmark, we apply three “must-have” criteria—Presence, Completeness, and Synteny—to determine whether BL21(DE3) has comparable native/potential capacity to metabolize EG, and compile an EG-pathway gene list together with gene–protein–reaction (GPR) annotation starting points.

2. Genome Data QC & Standardized Re-annotation

We downloaded reference sequences for MG1655 and BL21(DE3) from NCBI⁴, assessed assembly/consistency with QUAST⁵, then re-annotated with Prokka to unify naming and functional labels (EC/KO/GO), creating a consistent baseline for cross-strain comparison⁶. The results (Figure 2) showed no abnormal segments impacting downstream comparison; continuity, mismatches, and gaps matched public references, and no contamination-like contigs were detected.

Figure 2: Sequence quality-control results for the two strains

3. Orthology Clustering & Phylogeny

We inferred orthogroups with OrthoFinder, aligned single-copy orthologs with MAFFT, and constructed trees using IQ-TREE/FastTree to verify comparability at the core-metabolism level⁷. In total, 3,920 orthogroups were identified; 3,901 were shared (core orthogroups), and 3,755 were single-copy orthologs. Only 19 strain-specific orthogroups (57 genes, ~0.7%) were found. Figure 3 and 4 indicate limited coding-level differences between BL21 and MG1655, supporting cross-strain functional comparability.

Figure 3: Orthology-clustering results for the two strains

Figure 4: Phylogenetic tree results for the two strains (Tree scale: 0.001)

4. Gene Synteny & Whole-Genome Structural Checks

Using MCScanX to detect conserved blocks anchored by protein homology and progressiveMauve for genome-wide structural alignment, we focused on EG-related regions to check for rearrangements that might disrupt pathway integrity⁸. Figure 5 shows that EG-pathway genes in BL21(DE3) are not only present but also well-arranged and structurally conserved, providing a structural evidence chain for native metabolic potential.

Figure 5: Synteny analysis results for the two strains. Lines highlighted in red indicate EG metabolic genes.

5. Conclusion

Under a unified QC and annotation framework, we confirm—with combined evidence of “presence–conservation–synteny”—that BL21(DE3) has EG metabolic potential comparable to MG1655; differences are more likely due to regulation or enzyme parameters rather than gene loss. This conclusion offers a reliable starting point for writing the native EG pathway into the BL21(DE3) GSMM and for subsequent yield-ceiling prediction and bottleneck identification.

Project 2: BL21(DE3) GSMM Reconstruction and Flux Analysis of Key EG Pathways

1. Research Objective

Comparative genomics (with MG1655 as the reference) provides structural evidence for BL21(DE3)’s EG pathway—Presence, Completeness, and Synteny—and outputs an EG gene list plus GPR starting points. Based on this, we wrote these “curatable and credible” gene–reaction sets into the BL21(DE3) GSMM for downstream flux analysis and engineering-target screening.

2. Reconstruction of the BL21(DE3) GSMM

2.1 Baseline Model Selection

We systematically compared five published BL21(DE3) GSMMs, excluding iECBD_1354 (constructed for a derivative strain, BL21-Gold(DE3)pLysS), and evaluated each by metabolite, reaction, and gene counts. Table 1 summarizes the key parameters. We selected iB21_1397 as the baseline because it has top-tier metabolite (1,943) and reaction (2,741) counts and works smoothly with COBRApy, meeting needs for product-optimization analyses after EG-pathway supplementation.

Table 1. Comparison of existing BL21(DE3) GSMMs

Model ID	Metabolites	Reactions	Genes	Key Construction/Calibration Notes	Database/Paper Source (Publication/Deposition Year)
iECD_1391	1943	2741	1333	Semi-automated reconstruction via ModelSEED followed by manual curation; the first GEM for BL21(DE3), often used as an educational example	BiGG entry iECD_1391 (2013, Monk et al.¹⁰)
iEC1356_Bl21DE3	1918	2740	1356	Based on iECD_1391: re-aligned genome annotations, standardized namespace, and supplemented missing cycles	BiGG entry iEC1356_Bl21DE3 (2017-2018, Deposited in BiGG)
iHK1487	1882	2714	1498	Re-annotated based on the latest genome; exhibits the highest prediction accuracy	Paper & BioModels MODEL2408040002 (2018, Kim et al.¹¹)
iB21_1397	1943	2741	1337	Automated draft model generated by CarveMe + manual gap-filling; often used with COBRApy for product optimization	BiGG entry iB21_1397 (2019, Deposited in BiGG)
iECBD_1354	1952	2748	1354	Developed for the BL21-Gold(DE3)pLysS derivative strain; includes metabolic pathways for antibiotic resistance and plasmid maintenance	BiGG entry iECBD_1354 (2020, Deposited in BiGG)

We ultimately selected iB21_1397 as the baseline model for the following reasons: its number of metabolites (1,943) and reactions (2,741) are at the top level among available models, and it is explicitly compatible with COBRApy, which meets the requirements for subsequent product-optimization analyses after supplementing the EG pathway.

2.2 Data Curation & EG-Pathway Completion

To address the missing “EG → PEAs” pathway in iB21_1397, we integrated multiple data sources and completed pathway supplementation as follows:

2.2.1 Data Collection

From KEGG, BiGG, and EG-metabolism literature^{2, 12}, we curated:

the complete reaction chain from EG to PEAs (EG → glycolaldehyde → glycolate → glyoxylate → 2-hydroxy-3-oxopropionate → hydroxypyruvate → (R)-glycerate → 2-phospho-D-glycerate → glycolysis → acetyl-CoA → PEAs);
the GPR associations for each step;
stoichiometry ensuring mass balance.

2.2.2 Pathway Integration and Network Fixes

We added six EG-key reactions and five metabolites (EG, (R)-3-hydroxybutyryl-CoA, (R)-lactyl-CoA, 3-aminopropionyl-CoA, PEAs), updated GPR links, and corrected the stoichiometric matrix for overall mass balance. Cross-checking other sources revealed a common discrepancy—lack of the (R)-glycerate → 3-phospho-D-glycerate step—so we removed that incorrect reaction from the original model. The re-engineered model, iBL21_EG-PEA, now contains 1,948 metabolites (1,943 + 5) and 2,746 reactions (2,741 + 6 − 1).

Its value lies in: (i) providing a complete computable path “EG → PEAs” for flux analysis; (ii) enabling direct reaction-to-gene mapping for FVA/FDCA via GPR; and (iii) ensuring mass balance for reliable FBA/dFBA simulations.

2.2.3 Visualization

We illustrated the full flow “EG uptake → intermediate metabolism → PEAs synthesis,” clarifying the handoff between “EG → acetyl-CoA” and “acetyl-CoA → PEAs.”

Figure 6: EG–PEA metabolic pathway map for BL21(DE3) (Added genes/reactions/metabolites are marked)

In short, we supplemented the complete “EG → acetyl-CoA” metabolic route into the baseline model iB21_1397 by adding six key reactions, five metabolites and the corresponding GPR associations, and by removing/modifying one inconsistent reaction. The reconstructed model iBL21_EG-PEA thus contains 1,948 metabolites (1,943 + 5) and 2,746 reactions (2,741 + 6 − 1). The core values of this reconstruction are: (1) the six added EG metabolic reactions provide a complete computational path from EG to PEAs for flux analysis; (2) the added gene–protein–reaction (GPR) associations enable direct mapping between reactions and genes in FVA/FDCA analyses; (3) the corrected stoichiometric matrix ensures mass conservation for FBA/dFBA simulations of the EG → PEAs material flow, supporting the reliability of flux predictions.

3. Flux Analyses on iBL21_EG-PEA

Using four complementary methods—FBA, dFBA, FVA, and FDCA—we built a pipeline from basic flux prediction, to dynamic simulation, to flux-range validation, and finally target prioritization (Table 2). To emphasize contrasts under carbon limitation, we set EG at 10 mmol/L; oxygen uptake was set to 18.5 mmol/gDW/hr to reflect realistic aeration.

Table 2. Core principles and objectives of key metabolic flux analysis methods⁹

Analysis Method	Core Principle	Analysis Objective
Flux Balance Analysis (FBA)	Uses linear programming on the stoichiometric matrix to enforce mass conservation of metabolites and a defined objective function (e.g., biomass) to compute an optimal steady-state flux distribution.	Simulate the flux distribution of BL21(DE3) metabolizing ethylene glycol (EG) when biomass production is the optimization objective.
Dynamic Flux Balance Analysis (dFBA)	Extends the static metabolic network into a dynamic process by coupling differential equations with linear programming, enabling simulation of time-dependent metabolic changes (e.g., growth curves).	Simulate the growth trajectory of BL21(DE3) when EG is the sole carbon source.
Flux Variability Analysis (FVA)	With the optimized objective value (or a fraction thereof, e.g., 90% of optimal product yield) fixed, computes upper and lower bounds for each reaction flux to assess flux robustness and network flexibility.	Exclude highly variable/non-essential reactions and focus on core reactions with narrow flux ranges and large mean fluxes.
Flux Difference Comparison Analysis (FDCA)	Compares flux distributions under “biomass maximization” vs. “product maximization”; the differences guide interpretation of which reactions/genes favor growth or product formation.	Elucidate how metabolic flow is partitioned between growth and product synthesis, identify reactions and genes that favor product formation, and thereby select targets for overexpression.

3.1 Flux Balance Analysis (FBA) Results

FBA with EG as the sole carbon source showed high fluxes for oxygen uptake, EG uptake, the key reactions of EG catabolism, the glyoxylate shunt, and glycolysis-related reactions. These flux distributions indicate that the EG metabolic pathway connects to core carbon metabolism and oxygen utilization, providing flux support for acetyl-CoA formation and biomass synthesis.

3.2 Dynamic Flux Balance Analysis (dFBA) Results

Dynamic FBA at an EG concentration of 10 mmol·L⁻¹ indicates that the growth rate (orange dashed line) rises rapidly during the early phase and remains at a relatively high level (~0.06–0.065 h⁻¹) before slowly declining. Cell concentration (blue solid line, OD₆₀₀) increases continuously from ~0.10 at the start to exceed 0.40 by 25 h, illustrating the time-dependent coupling between instantaneous growth rate and biomass accumulation.

Figure 8: dFBA flux analysis results (X-axis: Time (h); Left Y-axis: OD₆₀₀; Right Y-axis: Growth Rate (1/h))

3.3 Flux Variability Analysis (FVA) Results

Some reactions are reversible and thus may have fluxes spanning positive and negative values; others form cycles, which enlarges their flux ranges. The importance of a reaction should be judged from two aspects: its flux range (substitutability) and the absolute value of its mean flux (load). A small flux range implies low substitutability and therefore a “hard-required” reaction; a large mean absolute flux implies the reaction carries a high load and is relatively more important. In this study we set the FVA criterion: a “hard-required” reaction has flux range < 5.0 mmol·gDW⁻¹·h⁻¹ and mean absolute flux > 4 mmol·gDW⁻¹·h⁻¹. According to Figure 9, reactions with both small range and large mean (red box) correspond to genes fucO, aldA, gcl, glxR/garR, and glxK.

FVA results for maximizing PEAs — Figure 9: FVA results for the objective of maximizing PEAs

We also analyzed the trade-off between biomass and target product. As shown in Figure 10, cell growth rate and polymer production exhibit a pronounced negative trade-off: as growth rate increases from near 0 to ~0.3 h⁻¹, the maximum polymer production rate linearly decreases from its highest value (0.289 mmol·gDW⁻¹·h⁻¹) to near zero. When “maximum polymer production” is the objective, the growth rate approaches zero (red pentagram). When “maximum growth” is the objective, the growth rate reaches 0.295 h⁻¹ (green cross), but polymer production is nearly abolished. This indicates that cellular resources are allocated between growth and polymer synthesis and that the two cannot be simultaneously maximized.

Figure 10: Relationship between biomass and target product production

3.4 Flux Difference Comparison Analysis (FDCA) Results

We computed reaction fluxes under “biomass maximization” and “product (PEAs) maximization” and calculated the difference V_diff = V_{product_max} - V_{biomass_max}. Interpreting V_diff: if V_diff >0.5 mmol·gDW⁻¹·h⁻¹, the reaction favors product synthesis but may suppress growth; if V_diff <−0.5 mmol·gDW⁻¹·h⁻¹, the reaction favors growth but suppresses product formation; if V_diff ∈[-0.5, 0.5] mmol·gDW⁻¹·h⁻¹ and flux > 4 mmol·gDW⁻¹·h⁻¹, the reaction can support both growth and product synthesis. From Figure 11, reactions that are favorable to both growth and PEAs production (red boxed) correspond to genes fucO, aldA, gcl, glxR/garR, and glxK.

Figure 11: FDCA results with PEAs as the analysis objective

4. Model-Guided Construct Design & PCR Validation

Balancing growth rate and PEAs yield and guided by FVA and FDCA, we prioritized five genes as key regulatory targets: fucO, aldA, gcl, glxR/garR, and glxK.

To validate the model predictions and enable BL21(DE3) to efficiently use EG as a carbon source, we designed a parallel “baseline control – model-driven optimization” experimental scheme and constructed the following plasmids:

a) Baseline plasmid piGEM25_05: based on the four-gene overexpression combination (gcl–hyi–glxR–glxK) reported in the literature²; this combination has been shown to significantly enhance EG metabolism in M9 medium with supplemental carbon sources and thus serves as the baseline reference.
b) Model-driven optimization plasmid piGEM25_04: from GEM flux analysis we retained the two “essential nodes” gcl and glxK and substituted the isoenzyme garR for glxR while omitting hyi, resulting in piGEM25_04 (gcl–garR–glxK) to test whether the isoenzyme substitution improves EG utilization efficiency.
c) Model-driven extension plasmid piGEM25_06: to test the model-predicted “upstream flux enhancement” nodes fucO–aldA, we constructed piGEM25_06 (fucO–aldA). This design was informed by observations (Balola et al.²) that the fucO–aldA effect can be dependent on auxiliary carbon sources; here we test whether overexpressing this module under pure-EG conditions compensates for insufficient native expression and enhances upstream activation (EG → glycolaldehyde → glycolate).

Parallel transformation and EG-utilization assays for the three plasmids (piGEM25_05 baseline, piGEM25_04 replacement, piGEM25_06 upstream extension) showed that under EG-only conditions, piGEM25_04 and piGEM25_05 exhibited comparable phenotypes in growth and EG consumption—indicating that garR can substitute for glxR and that hyi can be omitted under these conditions—whereas the co-transformation of piGEM25_04 + piGEM25_06 and the empty-vector control did not grow, supporting that fucO–aldA are not limiting in the EG-only context and that their overexpression does not provide benefit.

5. Conclusion

We reconstructed the GSMM of E. coli BL21(DE3), wrote in the previously missing “EG → acetyl-CoA → PEAs” branch with corresponding GPRs to form iBL21_EG-PEA, and verified network feasibility under EG-only conditions. Systematic FBA/dFBA/FVA/FDCA analyses clarified flux allocation and bottlenecks: gcl, glxK, and glxR/garR form the product-oriented control core, while fucO, aldA are supportive but not rate-limiting under EG-only. The model therefore proposes a minimal overexpression set gcl–garR–glxK as the priority design; combined with wet-lab tests, this three-gene scheme performed comparably to the literature four-gene set gcl–hyi–glxR–glxK while reducing construction burden. Overall, Project 2 delivers a reusable iBL21_EG-PEA enhanced model, a clear target priority, and a minimal overexpression strategy.

References

Tournier, V.; Topham, C. M.; Gilles, A.; David, B.; Folgoas, C.; Moya-Leclair, E.; Kamionka, E.; Desrousseaux, M.-L.; Texier, H.; Gavalda, S.; Cot, M.; Guémard, E.; Dalibey, M.; Nomme, J.; Cioci, G.; Barbe, S.; Chateau, M.; André, I.; Duquesne, S.; Marty, A. (2020). An engineered PET depolymerase to break down and recycle plastic bottles. Nature, 580(7802), 216–219.
Balola, A., Ferreira, S., & Rocha, I. (2024). From plastic waste to bioprocesses: Using ethylene glycol from polyethylene terephthalate biodegradation to fuel Escherichia coli metabolism and produce value-added compounds. Metabolic Engineering Communications, e00254. https://doi.org/10.1016/j.mec.2024.e00254
Chi, J., Wang, P., Ma, Y., & Zhang, X. (2024). Engineering Escherichia coli for utilization of PET-degraded ethylene glycol. Biotechnology for Biofuels and Bioproducts, 17, 87.
National Center for Biotechnology Information (NCBI)[Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [1988] – [cited 2025 May 28]. Available from: https://www.ncbi.nlm.nih.gov/.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014 Jul 15;30(14):2068-9.
Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238.
Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, Kissinger JC, Paterson AH. (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res, 40(7): e49.
Jia Lv. (2023). Reconstruction of the Metabolic Model of Escherichia coli MG1655 and Investigation of the Regulatory Mechanism of 1,4-Butanediamine Metabolism [Master's Thesis]. Tianjin University of Science and Technology.(吕佳. (2023). 大肠杆菌 MG1655 代谢模型的重构及 1,4-丁二胺代谢调控机理的研究 [硕士学位论文]). 天津科技大学. Retrieved from https://www.cnki.net (accessed on Sep. 20, 2024).
Monk, J.M.; Charusanti, P.; Aziz, R.K.; Lerman, J.A.; Premyodhin, N.; Orth, J.D.; Feist, A.M.; Palsson, B.Ø. Genome-Scale Metabolic Reconstructions of Multiple Escherichia coli Strains Highlight Strain-Specific Adaptations to Nutritional Environments. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (50), 20338–20343. https://doi.org/10.1073/pnas.1307797110
Kim, H.; Kim, S.; Yoon, S.H. Metabolic Network Reconstruction and Phenome Analysis of the Industrial Microbe, Escherichia coli BL21(DE3). PLOS ONE 2018, 13 (9), e0204375.
Panda, S.; Zhou, J.F.J.; Feigis, M.; Harrison, E.; Ma, X.; Yuen, V.F.K.; Mahadevan, R.; Zhou, K. Engineering Escherichia coli to Produce Aromatic Chemicals from Ethylene Glycol. Metab. Eng. 2023, 79, 38–48.