The initial design phase adopted modular principles from industrial software engineering. Core frameworks include a plugin-based PyMOL interface, the MAGIC scheduling protocol, a YAML-based function–window mapping mechanism, and the Hydra+OmegaConf configuration bus (Fig. 1). Optional component management was implemented through pip extras, separating tools such as RFdiffusion, DLPacker, and OpenMM into modular packages. REvoDesign leverages the Qt environment embedded within PyMOLto avoide potential commercial licensing issues (Fig. 2).
Fig. 1 Architectural Design of the Evolutionary Data Computation Service for RevoDesign.
Fig. 2 MAGIC protocol architecture.
The development environment is based on VS Code, Miniconda for environment management, Git for version control, and GitHub private repositories for secure hosting. A standardized workflow—branch creation, incremental commits, pull requests, CI checks, squash merging, and branch removal—ensures traceability and compliance with conventional commit practices.
Continuous integration and delivery were realized through GitHub Actions, automatically triggering tests across operating systems, processor architectures, Python versions, and PyMOL distributions. Testing employed PyTest to organize 585 unit and integration tests, achieving ~70% coverage. Graphical interface testing combined PyTest-qt with an internally developed TestWorker utility to simulate real interactions. Lightweight cases were parallelized, whereas more demanding tasks were executed sequentially to prevent resource conflicts.
Automated reviews and coverage reports provided immediate feedback on design flaws, security risks, and performance bottlenecks. Development efficiency was further enhanced through exploratory “Vibe Coding,” where large language models (e.g., ChatGPT, Claude) were used to generate code templates, refine logical structures, and accelerate prototyping. Insights from these practices informed subsequent iterations of architectural refinement, testing strategy upgrades, and feature development.
Within the graphical PyMOL plugin interface, the workflow was modularized into distinct functional units: loading of protein structures and evolutionary information; prioritization of candidate hotspots using evolutionary data and structure; rational design of mutants based on PSSM, ddG, and ESM-1v DMS predictions; integration of external design tools (Cartesian ddG, ColabDesign, RFdiffusion) for cross-method validation; Clustering and co-evolutionary analysis, selection of final mutation combinations.
Mutant modeling was conducted using side-chain prediction engines such as DLPacker, PIPPack, and Rosetta-MutateRelax.Declarative YAML-based window configuration enables rapid conversion of Python functions into interactive modules. Input parameters can be archived and reloaded through configuration recipes, ensuring reproducibility across experiments
Rational validation involved superimposing modeled side chains onto original structures, combining these with PSSM and ddG calculations to distinguish conservative from high-risk substitutions. Critical residues underwent both visual inspection and computational confirmation to maintain consistency with evolutionary constraints.For computational scalability, evolutionary data processing was implemented as a standalone Flask+Celery+Docker service, supporting parallel task submission, monitoring, and result packaging under high-concurrency conditions.
Feedback from computational predictions (e.g., PSSM distributions, ddG changes) was cross-validated with experimental functional data, enabling refinement of hotspot prioritization and mutation strategies. The Hydra configuration bus and recipe files facilitate complete reconstruction of prior experimental environments, supporting dynamic adjustment of design strategies.
Cross-platform benchmarking of components such as DLPacker, RFdiffusion, and RosettaPy provided insights into performance and stability under different hardware conditions, guiding optimal resource allocation for future experiments.
The use of microbial cell factories enabled by synthetic biology to produce natural products has become a major focus of current research. However, the key enzymes required for natural product biosynthesis often exhibit poor stability and low catalytic efficiency when expressed in heterologous systems, thereby reducing overall biosynthetic efficiency. Current protein design platforms are primarily tailored for industrial enzyme applications and face limitations in addressing challenges such as epistatic interactions across multiple sites, the intrinsic trade-off between activity and stability, and reliance on high-throughput approaches.
REvoDesign integrates state-of-the-art protein design technologies and tools, leveraging high-precision complex structures and evolutionary information to guide rational protein engineering. By applying cross-regional combinatorial optimization between catalytic sites and protein surface regions, and accounting for long-range structural interactions, the platform alleviates challenges associated with multi-site epistasis and the trade-off between activity and stability. Furthermore, by employing cross-validation and screening of variants through multiple algorithms, REvoDesign enables functional enzyme improvement using only a compact and focused mutant library, thereby overcoming the reliance on high-throughput approaches.
To evaluate the reliability of our tool, we reconstructed a carotenoid biosynthetic pathway in Saccharomyces cerevisiae and engineered its key gene, CarRP. CarRP comprises two functional domains: the R domain, which exhibits lycopene cyclase activity, and the P domain, which possesses phytoene synthase activity but depends on the proper conformation of the R domain for functionality. In designing loss-of-function mutations in the R domain, it was critical to ensure that its overall folding remained intact. Accordingly, we employed DiffDock and AlphaFold3 to perform enzyme–substrate complex modeling and molecular docking. We identified the region within approximately 6 Å of the substrate lycopene in the R domain as the putative active site. To avoid disrupting the phytoene synthase function of the P domain, we targeted only residues distant from this domain for mutagenesis and subsequent experimental testing. Using the “visualization” function of REvoDesign, we conducted rational design-based evaluation and selected two candidate mutation sites (F81A and Y145F). Experimental validation confirmed that the Y145F mutant completely lost lycopene cyclase activity and produced the highest lycopene yield, whereas the F81A mutant still accumulated a small amount of β-carotene (Fig. 3).
Fig. 3 CarRP transformation results.
a HPLC detection results of yeast fermentation products. b Yeast transformation results and product ratios.
Based on the structure of T5αH, we applied REvoDesign for active-site pocket analysis and rational mutagenesis design. Using the visualization tools of REvoDesign, we assessed substrate binding energy and protein folding free energy, and identified 23 candidate mutants. Experimental validation revealed that mutations in the active-site region (L72M, V226E) and in the stability region (Q122A, Q266A) significantly enhanced the yield of the target product. Building upon the high-yield single mutants, we combined beneficial mutations from the active and stability regions, ensuring that the functional regions did not overlap, to construct double mutants. The results demonstrated that the double mutants produced significantly higher levels of the target product compared with all single mutants and the wild type (Fig. 4).
Fig. 4 T5αH transformation results.
Molecular dynamics simulations are employed to analyze combinations of beneficial mutations and to elucidate the mechanisms by which these variants enhance yield. The resulting insights serve to guide the further optimization of enzyme function.
Heterologous production of paclitaxel by reconstituting the biosynthetic pathway in microorganisms is a highly promising route. However, it is still highly challenging to achieve de novo and efficient synthesis of paclitaxel in microbial hosts due to bottlenecks constituted along the pathway. T5αH represents one such enzyme-level bottleneck due to its poor activity and selectivity when expressed in microbial hosts (Fig. 5). Therefore, it is imperative to enhance the conversion of taxadiene into T5OH, which, as such, would be advantageous for improved pathway control and performance. To address this, we constructed a yeast chassis to support downstream engineering.
Fig. 5 Schematic diagram of the T5OH biosynthesis pathway。
Recombinant vectors carrying BST1 and TS were successfully introduced into S. cerevisiae, generating the engineered chassis strain. Sequencing confirmed correct assembly(Fig. 6).
Fig. 6 Vector backbone and sequencing results.
The engineered chassis strain was capable of producing T5OH. When cultured in glucose medium, it proliferated normally, and upon transfer to galactose medium, it generated precursor metabolites required for T5OH biosynthesis.
We set out to design and engineer T5αH variants using our REvoDesign approach.The three-dimensional complex structure of T5αH, the cofactor heme, and the substrate taxadiene was predicted using AlphaFold2 and DiffDock.In parallel, evolutionary information for T5αH was obtained through position-specific scoring matrix (PSSM) analysis and Gremlin coevolutionary modeling.Based on these structural and evolutionary insights, we carried out rational engineering of T5αH to guide the design of improved enzyme variants.
We identified two functional regions of T5αH for targeted mutagenesis: activity-related sites located within the substrate pocket and stability-related sites outside the substrate and heme pockets. Mutants were designed using REvoDesign, excluding proline and cysteine substitutions, and screened based on predicted substrate binding energy (activity) or protein folding free energy (stability). This strategy yielded a focused set of variants with enhanced catalytic efficiency or improved structural stability for subsequent experimental validation.
Finally, 23 mutants were selected—including 15 targeting activity-related sites(Tab. 1) and 8 targeting stability-related sites(Tab. 2)—with the goal of separately verifying how mutations in the two types of sites influence T5OH production.
Tab. 1 Screening of Activity-Related Mutants
Tab. 2 Screening of Stability-Related Mutants
This design phase yielded 15 variants targeting catalytic activity and 8 targeting structural stability. These will undergo functional validation in subsequent experiments.
Iteration1
The constructed T5αH mutants were cloned into a yeast expression vector and transformed into the engineered chassis strain. The resulting transformants were subjected to small-scale fermentation, followed by extraction of the target product T5OH and analysis using GC–MS.
In the initial fermentation trials, the wild-type (WT) strain was used to preliminarily optimize culture conditions. Fermentations were carried out in 100-, 150-, 250-, and 500-mL Erlenmeyer flasks containing either 10 mL or 20 mL of medium, incubated at 30 °C and 220 rpm for 5 days. The fermentation products were then extracted for analysis.
Unfortunately, no detectable T5OH was observed in any of the fermentation broths.
Comprehensive verification of both the recombinant expression vectors and the chassis strain revealed no construction errors, indicating that the failure to detect T5OH was not due to upstream genetic assembly. We hypothesized that the absence of product was likely caused by insufficient precursor supply within the chassis strain, thereby limiting T5OH biosynthesis. Given time constraints, reconstructing a chassis with an enhanced precursor pathway was not feasible. Literature reports suggested that lowering the fermentation temperature can increase the accumulation of the T5OH precursor taxadiene (Fig. 7). Therefore, we planned to perform subsequent fermentations at 20 °C to enhance T5OH production(1).
Fig. 7 Taxadiene production.
Iteration2
Based on the preceding analysis, we adopted a strategy reported in the literature to re-evaluate T5OH production by lowering the fermentation temperature.
The wild-type (WT) strain was again used to optimize the culture conditions. Fermentations were conducted at 20 °C and 220 rpm for 5 days, and the products were subsequently extracted for analysis,after which the products were extracted with n-hexane for GC–MS analysis.
Under these modified conditions, T5OH was successfully detected in the 20mL fermentation cultures, demonstrating that lowering the fermentation temperature facilitated product formation.
This improvement in fermentation conditions enabled the detection of T5OH, confirming the feasibility of the revised approach. The optimized parameters (20 °C, 220 rpm, 5 days, 20 mL culture volume) were subsequently applied to mutant strains to evaluate their effects on T5OH biosynthesis.
Iteration3
Following preliminary optimization of the fermentation parameters, a total of 23 rationally designed T5αH mutants—targeting both activity-related and stability-related sites—were selected for functional validation.
Wild-type (WT) strain served as the control. For each mutant, three independent single-colony isolates were cultured in 100-mL Erlenmeyer flasks containing 20 mL fermentation medium at 20 °C and 220 rpm for 5 days. The fermentation products were extracted with n-hexane, and β-caryophyllene was added as an internal standard for quantitative analysis.
Among the 23 mutants tested, several variants exhibited significantly enhanced T5OH production relative to the WT control. Notably, activity-related mutants L72M and V226E demonstrated markedly improved catalytic activity (Fig. 8a), while stability-related mutants Q122A and Q266A also showed substantial increases in T5OH yield (Fig. 8b).
Fig. 8 Effects of T5αH activity-related and stability-related mutations on T5OH production.
a T5OH production efficiency of activity-related mutants vs WT T5αH. The vertical axis represents relative T5OH yield. Compared to WT, mutants targeting activity-related sites showed distinct yield changes: L72M and V226E exhibited highly significant increases in T5OH yield, with yields ~2.3-fold and ~2.1-fold that of WT, respectively—this is attributed to enhanced substrate binding affinity and catalytic turnover rate. b Residual activity of stability-related mutants and their correlation with T5OH yield. Mutants targeting stability-related sites showed clear stability-yield correlations: Q122A, Q266A exhibited highly significant yield improvement. Statistical significance is marked by asterisks: ***p < 0.001.
These results validated the effectiveness of our structure-guided, evolution-informed design strategy, identifying key mutations that enhance either catalytic efficiency or structural stability of T5αH. The best-performing mutants will serve as the basis for further combinatorial mutagenesis to evaluate potential additive or synergistic effects on T5OH biosynthesis.
To further enhance T5OH (taxadien-5α-ol) production efficiency, we leveraged the results of single-site mutant verification and adopted a rational combination strategy for activity-related and stability-related mutation sites—selecting high-performance single mutants that target non-overlapping functional regions (to avoid mutual interference between mutations) for site-directed combination. This design was guided by REvoDesign’s predictive analysis: the tool confirmed that the selected activity-related sites (located in the substrate pocket) and stability-related sites (located in the structural maintenance region) have independent spatial distributions and functional roles, ensuring that their combination would not disrupt the enzyme’s overall structure or catalytic core.
We constructed recombinant plasmids carrying different double mutant sequences via molecular cloning. We performed various types of arrangement and combination on the aforementioned high-yield single mutants to further obtain double mutants, and observed whether they possess a higher yield.
Fermentation assays revealed that the L72M+Q122A double mutant achieved the highest improvement in T5OH yield compared to single mutants and WT(Fig. 9).
Fig. 9 Comparison of T5OH yields among T5αH Wild-Type, single mutants, and double mutants.
In summary, the yield data of the double mutants not only confirms that multi-dimensional optimization of T5αH can drastically improve T5OH production but also further validates REvoDesign’s effectiveness as a powerful tool for rational enzyme engineering.
Saccharomyces cerevisiae possesses an endogenous mevalonate (MVA) pathway that provides precursors for terpenoid biosynthesis; however, its native metabolic network lacks the complete enzymatic machinery required for lycopene production(Fig. 10). To overcome this limitation, we previously integrated the CarG and CarB genes into a genomic integration vector, thereby creating an engineered yeast strain capable of synthesizing the precursor metabolites necessary for lycopene biosynthesis(2-5).
Fig. 10 Lycopene synthesis Pathway.
Fig. 11 Construction of gene expression vector.
To identify positive clones, colony PCR was performed using primer pairs flanking the CarG and CarB genes, generating expected amplicons of 912 bp and 1740 bp (Fig. 12). respectively. Colonies exhibiting the correct PCR profiles were subsequently selected for Sanger sequencing to confirm construct integrity (Fig. 13).
Fig. 12 Colony PCR Identification.
Fig. 13 Sequencing results.
Through this round of experimentation, we successfully constructed the recombinant plasmid pMASC02-TTPI1-CarB-PGAL10-PGAL1-CarG-TPGI- Ura3. Initially, plasmid assembly using a conventional single-fragment recombinase failed to yield the desired construct. Subsequently, NEBuilder HiFi DNA Assembly Master Mix was employed for multi-fragment assembly, which resulted in successful plasmid construction. As the next step, we plan to integrate this plasmid into the Saccharomyces cerevisiae BY4742 strain to further reprogram its metabolic pathway for the production of lycopene precursors.
Based on literature reports, Saccharomyces cerevisiae BY4742 was selected as the host strain for chassis construction because of its well-characterized genetic background, ease of genetic manipulation, and suitability for heterologous protein expression. Moreover, the endogenous mevalonate (MVA) pathway in S. cerevisiae provides precursor metabolites for lycopene biosynthesis, and the strain offers the advantages of simple cultivation and low production cost.
Fig. 14 Schematic diagram of enzyme digestion.
PCR screening identified positive transformants with the expected 912 bp and 1740 bp bands (Fig. 15a). The verified clone, designated BY01 (Fig. 15b), was stored in 20% glycerol at –80 °C for long-term preservation.
Fig. 15 Construction of lycopene engineering strain.
a The electrophoresis profile of PCR detection for transformed yeast strains. b The colony morphology of recombinant yeast inoculated onto Ura-deficient (Uracil-deficient) medium plates after 2 days of cultivation.
Through this round of experimentation, we successfully constructed a Saccharomyces cerevisiae chassis strain capable of supporting lycopene biosynthesis. During the initial yeast transformation, no colonies were observed on uracil-deficient (Ura⁻) selection medium. This failure was attributed to an insufficient concentration of the NotI-digested plasmid. After repeating the digestion and increasing the plasmid concentration, transformation yielded normal single colonies. The next phase of this work will focus on functional engineering of CarRP, aiming to convert the native bifunctional enzyme into a monofunctional variant.
CarRP is a bifunctional enzyme consisting of an N-terminal R domain (lycopene cyclase) and a C-terminal P domain (phytoene synthase). The R domain can function independently, whereas the P domain requires proper folding of the R domain. Using REvoDesign, we targeted the R domain to suppress lycopene cyclase activity while retaining phytoene synthase activity. Two variants, F81A and Y145F, were obtained (Fig. 16).
Fig. 16 Model of interactions of the amino acids in the active site of the protein with lycopene. Green: Wild-type;yellow: lycopene;magenta: acid mutant.
Analysis results showed that the fermentation products of CarRP_F81A and CarRP_Y145F mutants both exhibited a high lycopene peak, with yields of 1.42 g/L and 1.46 g/L, respectively. CarRP_F81A produced a very small amount of β-carotene, while CarRP_Y145F produced no β-carotene at all. Furthermore, the lycopene yields of each mutant strain were comparable to the β-carotene yield of the strain containing wild-type CarRP, indicating that the two CarRP mutants designed in this study had almost lost their lycopene cyclase activity (Fig. 17). As a result, lycopene no longer cyclized to form β-carotene but accumulated as the final product in Saccharomyces cerevisiae.
Fig. 17 Functional validation of CarRP mutants.
In this round of experiments, the functionality of the two engineered CarRP mutants was successfully validated. The introduced mutations effectively blocked β-carotene biosynthesis, resulting in enhanced lycopene accumulation in Saccharomyces cerevisiae. As the next step, we plan to perform scaled-up fermentations to evaluate lycopene production yields.
The above experiments confirmed that the engineered mutants enabled greater lycopene accumulation in Saccharomyces cerevisiae. The CarRP_Y145F strain was selected for pilot-scale fermentation in a bioreactor to evaluate lycopene production.
During the initial 48 h, carbon flux supported rapid biomass accumulation, with low lycopene productivity (0.033 g/L/h). As glucose was depleted, metabolism shifted toward secondary metabolism, including lycopene biosynthesis (Fig. 18a). After 216 h of fermentation, the strain achieved a titer of ~9.0 g/L lycopene, corresponding to a productivity of 0.049 g/L/h (Fig. 18b).
Fig. 18 Growth curve and lycopene production of the strain.
a High-density fermentation in a 3L fermenter. b Comparison of cell growth rate and lycopene productivity at different stages of fermentation.
Pilot-scale fermentation confirmed that the engineered CarRP mutant achieved high-level lycopene production (9.0 g/L). These results demonstrate both the robustness of the engineered strain and the effectiveness of REvoDesign-guided enzyme engineering.
Fig. 19 Research reports, expert consultations, and questionnaire surveys.
a Logos of organizations involved in disease reports and global health policy references, including the World Health Organization (WHO), International Agency for Research on Cancer, National Center for Cardiovascular Diseases, and World Heart Federation. b Logos of biomanufacturing companies whose annual reports were reviewed to elucidate industry insights, namely Angel Yeast Co, Ltd, Shanghai Pharmaceuticals Holding Co.,ltd, Chenguang Biotech Group Co.,Ltd, and Anhui Huaheng Biotechnology Co., Ltd. c Representative individuals interviewed for expert and industry input, including clinicians, researchers, and executives from relevant fields. d Promotional material for the synthetic biology - based survey carried out by the YNNU - China team, which contains a QR code for respondents to participate in the survey.
We built REvoDesign by integrating:
During this stage, we consulted protein design experts (Yuan Zhou, Haiyan Liu, Bin Huang) to refine the algorithm:
Fig. 20 Interview with Protein Design Consultant.
a Associate Professor Yuan Zhou. b Professor Haiyan Liu. c Professor Bin Huang.
We selected two case studies representing different industries:
Fig. 21 REvoDesign tool case test.
a bifunctional phytoene synthase/lycopene cyclase(CarRP). b taxadiene-5α-hydroxylase(T5αH)
From feedback by users, scholars, and companies:
Fig. 22 Evaluations and feedback from businesses and experts on REvoDesign help to iterate the tool.
We will enhance REvoDesign with AI-guided mutation combination prediction, build a “question-guided” design assistant, and expand to other high-value natural products, turning REvoDesign into an intelligent, interpretable, and scalable protein design platform.