1. Background
Geraniol dehydrogenase (GeDH) serves as the terminal "convergence valve" in our citronellal synthesis pathway, responsible for further oxidizing geraniol produced by upstream CsTPS1 into the target aldehyde product. The terminal oxidation step not only determines the final flux of the pathway but also directly affects the instantaneous concentration and residence time of intermediates within the cell. If the catalytic efficiency of this step is insufficient, geraniol can accumulate intracellularly and extracellularly, leading to leakage, side reactions, carbon loss, and measurement noise. Therefore, targeted optimization of GeDH within the "three-enzyme cascade" is a key lever to push production from "capable" to "high-yield." Our preliminary wet lab experiments support this: under the same induction conditions, qPCR showed that GeDH's transcriptional upregulation (≈12.6x) was lower than that of GPS/CsTPS1 (≈16-18x), while WB confirmed successful protein translation yet product improvement remained limited. This suggests that the "enzymatic base" (substrate binding, cofactor utilization, product release) and its coupling with the cellular environment may be the primary source of the current flux bottleneck.
Mechanistically, the dehydrogenation reaction from terminal alcohol to aldehyde is constrained by multiple factors. First, substrate entry and orientation: the geometry and electrostatics of the active pocket determine whether geraniol can stably position itself in a conformation favorable for hydride transfer. Second, cofactor coupling efficiency: the binding affinity, positioning, and regeneration rate of NADP+ affect effective catalytic turnover. Third, product release and pocket dynamics: if the product lingers or the pocket is too flexible, the next catalytic cycle is slowed. Fourth, cellular stress: aldehydes are inherently reactive and irritating; if the terminal step is not "powerful" enough, local toxicity caused by intermediates/products can suppress expression and folding. Based on these considerations, we established rational engineering goals for GeDH: enhance substrate ingress and positioning (increase effective binding and oriented hydride transfer probability), optimize cofactor compatibility and binding stability, moderately rigidify key loop regions in the active pocket to reduce unproductive wobbling, improve product egress to shorten the catalytic cycle, and maintain good solubility and expression robustness to avoid negative trade-offs ("gain activity, lose stability").
To achieve these goals, we adopted a dry lab optimization pipeline: "structure modeling → molecular docking → molecular dynamics → free energy decomposition," focusing on key sites in the active center and substrate channel for point and combinatorial mutagenesis (Fig. 1).
Fig. 1 Dry Lab Technical Roadmap
Specifically, we first used Chai1 modeling to obtain a high-quality 3D structure, jointly evaluated pocket geometry and conservation, and identified sites most sensitive to substrate hydroxyl recognition and exit gate stability. Subsequently, we used Rosetta for flexible docking and preliminary scoring, selecting candidates significantly superior to the wild-type in energy function. Then, we employed long-timescale molecular dynamics in GROMACS to examine their stability in explicit solvent and the retention of key hydrogen bonds. Finally, we used MM/PBSA to quantitatively decompose binding free energy and residue contributions, eliminating false positives that were "statically high-scoring but dynamically unstable." Through this pipeline, we obtained preferred combinatorial mutants with both high affinity and high stability, which were subsequently validated in wet lab experiments to significantly enhance the effective flux of the terminal oxidation step.
This strategy of "terminal step priority + flux balancing" forms a closed loop with our fermentation-side response surface optimization (induction temperature, IPTG and L-ara dosage, induction timing): only after the catalytic capacity of downstream GeDH is enhanced can the supply from upstream GPS/CsTPS1 be truly converted into yield gains, rather than intermediate accumulation. In other words, the dry lab optimization of GeDH is not merely "performance enhancement" of a single enzyme but a critical step in transitioning the entire synthesis network from "expression-driven" to "enzymology -driven," providing decisive support for ultimately pushing the yield to gram-per-liter levels and maintaining batch-to-batch consistency.
2. Mutagenesis Design
2.1 Protein Structure Prediction and Modeling
As GeDH lacks a resolved high-resolution 3D structure, we employed advanced protein structure prediction tools to model its conformation. First, we used Chai1 to predict the 3D structure of the wild-type GeDH amino acid sequence. Chai1 is a multimodal foundation model capable of unifying the prediction of structures for proteins, small molecules, DNA, RNA, and other biomolecules. In multiple benchmarks, Chai1's performance matches or exceeds leading models, including AlphaFold 3. Particularly in the challenging PoseBusters protein-ligand benchmark, Chai1 achieved a success rate of 77%, comparable to AlphaFold 3's 76%. Using Chai1's high-precision prediction, we obtained a folded model of GeDH.
2.2 Constructing Position-Specific Scoring Matrix (PSSM)
Based on protein sequence and structural analysis, we first performed multiple sequence alignment to construct a Position-Specific Scoring Matrix (PSSM) to identify amino acid sites that are evolutionarily less conserved and can tolerate various substitutions. A PSSM is a common method for representing motifs (patterns) in biological sequences. PSSM matrices were first experimentally used by Gribskov et al. and are applied in biological research such as protein function prediction, protein structure prediction, and protein binding site prediction. The PSSM matrix is generated through sequence similarity alignment, incorporating amino acid conservation information. For a protein sequence, the PSSM matrix has 20 rows (corresponding to the 20 amino acids) and a number of columns equal to the sequence length. An element P_ij in the PSSM represents the likelihood of the amino acid at position i in the sequence mutating to the j-th amino acid during evolution; a positive value indicates a higher likelihood, while a negative value indicates a lower likelihood. These non-conserved sites are more likely to acquire new functions through mutation without disrupting protein folding and were therefore selected as candidate mutation sites. This analysis screened approximately 10 candidate sites as targets for site-directed mutagenesis. The protein has the following 10 sites, each potentially mutable to the listed amino acids:
- 60 ACEHIKLMQRSTVY
- 65 PIKAA EFHLQTWY
- 125 PIKAA ACEFKLMPQRSTVWY
- 126 PIKAA ACDEFGHIKLMNQRSTVY
- 147 PIKAA FMY
- 298 PIKAA AEFGILMSTVW
- 299 PIKAA AGISTV
- 300 PIKAA ADEGKPQST
- 301 PIKAA ACDEFGHIKLMNPQRSTVY
- 322 PIKAA ACFILMSTVWY
Subsequently, we used protein design tools like Rosetta to systematically generate mutation combinations for these sites, expanding from single-point mutations to combinations of up to four amino acids. This method generated a vast mutant library:
- Single mutants: 114
- Double mutants: 5,729
- Triple mutants: 120,062
- Quadruple mutants: 466,951
To improve computational efficiency, we calculated a protein stability score for each mutant and filtered out the top 50% of single, double, triple, and quadruple mutants based on thermostability scores for subsequent binding affinity assessment. This stability-first screening strategy ensures that selected mutants improve function without severely compromising enzyme folding stability.
2.3 Molecular Docking
For the above mutants, we used molecular docking simulations to evaluate changes in their binding affinity for the substrate geraniol. Molecular docking computationally simulates the entry, conformational orientation, and interaction energy of a small molecule (substrate) within the protein's binding site to predict the ligand-receptor binding mode and affinity. We employed the RosettaLigand docking protocol to perform docking simulations for each GeDH mutant with geraniol. RosettaLigand is a small molecule docking tool within the Rosetta molecular modeling suite that considers the flexibility of both the ligand and surrounding side chains, sampling numerous conformations near the binding site and scoring each one. During docking, we defined the active pocket of the GeDH model as the search space, allowing geraniol to freely explore optimal binding poses. For each protein, we performed multiple independent docking simulations to sufficiently sample possible conformations and find the global optimal binding mode. Docking scores are based on the Rosetta energy function, including bonded, electrostatic, van der Waals, and solvation terms; a more negative total score represents more favorable predicted binding .
The heatmaps for protein stability and molecular docking scores are as follows:
Fig. 2 Single-point mutation heatmap. Each row represents a single-point mutation; left, middle, and right columns show Protein Score (folding/stability), Dock Score (geraniol docking energy), and Total Score (comprehensive score), respectively. Darker colors indicate more negative (favorable) scores.
Fig. 3 Two-point mutation heatmap. Each row represents a two-point mutation; columns as in Fig. 2.
Fig. 4 Three-point mutation heatmap. Each row represents a three-point mutation; columns as in Fig. 2.
Fig. 5 Four-point mutation heatmap. Each row represents a four-point mutation; columns as in Fig. 2.
These heatmaps summarize the performance of GeDH mutants across three metrics: Protein Score assesses stability, Dock Score assesses docking affinity with geraniol, and Total Score is the comprehensive score; darker colors indicate more negative (favorable) energy. The high consistency between Total and Dock scores indicates that substrate affinity/orientation is the primary factor determining terminal step efficiency, but Protein Score remains a necessary stability threshold to ensure soluble expression and engineering feasibility.
As shown in Fig. 6, we present the results of structural energy evaluation for the GeDH mutant library and Gaussian fitting. The distribution is clearly divided into two categories: mut3 (triple mutants) and mut4 (quadruple mutants) have means of μ≈-1183.19 and -1183.97, respectively, with small variances (σ≈1.29, 1.28). Their curves are shifted left and are narrower, indicating that these two mutant types maintain lower conformational energy and higher structural stability across different conformation samplings. In contrast, mut1 and mut2 have means of μ≈-1177.62 and -1177.41, with greater fluctuations (σ≈2.84, 3.16), belonging to the "moderate stability, higher dispersion" group. This suggests that triple and quadruple mutations can be used in subsequent combinatorial designs to offset the potential structural cost of "pro-binding" mutations, thereby improving the soluble expression and wet lab success rate of combinatorial mutants.
Fig. 6 Protein stability distribution of GeDH mutants
Fig. 7 shows the probability distribution and Gaussian fitting of the Rosetta dock_score (more negative indicates stronger predicted affinity for geraniol) for the four groups of GeDH mutants. The distributions of mut3 and mut4 are shifted left, with means of μ≈-1330.74 and μ≈-1330.64, significantly better than mut1 (μ≈-1329.10) and mut2 (μ≈-1328.78). This indicates that mut3/4, as "pro-binding" sites, have lower average docking energies and better theoretical affinity. Meanwhile, the standard deviations of mut3/4 (σ≈3.78/4.16) are slightly larger than those of mut1/2 (σ≈2.87/3.23), suggesting they can sample more favorable binding poses but also come with higher conformational heterogeneity.
Fig. 7 Docking energy distribution of GeDH mutants
As shown in Fig. 8, the scatter plot jointly displays docking energy (dock_score, x-axis; more negative is better) and protein stability (protein_score, y-axis; more negative is better) to simultaneously evaluate two key attributes of the GeDH-geraniol complex: "binding tightness" and "folding stability." The four mutant groups form distinct clusters: mut1 (blue) concentrates in the upper right region (weaker docking and moderate stability), mut2 (orange) shifts left overall (improved affinity) but with limited stability gain; while mut3 (green) and mut4 (light purple) occupy the lower left region, representing simultaneous improvement in both metrics. The point cloud of mut4 is closest to the lower left corner, achieving better docking without sacrificing stability.
Fig. 8 Dock Score x Protein Score joint evaluation scatter plot
The docking results indicate that multi-point mutation combinations often outperform single or few-point mutations in binding energy. In particular, our designed quadruple mutation combinations generally lead in docking scores compared to double or triple mutants (Fig. 9).
Fig. 9 Summary heatmap
Based on comprehensive stability scoring and docking results, we finally selected 4 mutants with the best predicted performance (along with the wild-type as control) for further molecular dynamics simulation studies.
| GeDH | Protein_score | Dock_score | Score |
|---|---|---|---|
| 64117 | -1187.136 | -1344.596 | -1281.6121 |
| 63928 | -1182.999 | -1344.791 | -1280.0742 |
| 54883 | -1184.333 | -1343.538 | -1279.8561 |
| 54456 | -1186.282 | -1342.141 | -1279.7974 |
| WT | -1161.051 | -863.977 | -982.8066 |
2.4 Molecular Dynamics Simulation
To further verify the structural stability and substrate binding persistence of the selected GeDH mutants in a solvated environment, we performed molecular dynamics (MD) simulations on the enzyme-substrate complexes. In the simulations, we used the docked conformation results of the GeDH-geraniol complexes for the 4 candidate mutants and the wild-type as initial structures (Fig. 10), placing them into solvated systems for a period of dynamics evolution.
Fig. 10 Results of molecular docking; this conformation served as the initial structure for molecular dynamics simulation
Specific steps included: Using GROMACS molecular dynamics software with the Amber molecular force field, the GAFF force field for the small molecule ligand, RESP2 charges,
solvating the protein-ligand system in a water box and adding 0.15 M physiological saline ions to bring the system close to physiological conditions. After energy minimization
to eliminate bad contacts, we performed stepwise heating and equilibrium simulations to stabilize the system temperature at 300 K and pressure at 1 bar, constraining the protein
backbone to allow solvent rearrangement. Then, we proceeded to production simulation, running 20 ns of unrestrained MD simulation for each system without any backbone constraints
to sufficiently sample the conformational space of the protein-ligand complex. The results are as follows (Fig. 11):
Fig. 11 Molecular dynamics simulation trajectories
As seen in Fig. 11, although 64117 and 63928 ranked high in docking, during prolonged simulation in solvent, geraniol (red sticks) gradually "drifted" from the active pocket towards the solvent side. Their ligand trajectories show significant spreading and off-site drifting, indicating that the initial high-scoring pose is difficult to maintain under real protein "breathing" and solvent competition. The movable loops near their pockets are "looser," lacking stable polar anchors, causing the ligand to rely mainly on hydrophobic contacts rather than directional interactions, making it prone to slippage or reorientation once the pocket opens, resulting in false positives.
In contrast, 54883 and 54456 exhibited sustained residence and correct orientation under the same simulation scale: the ligand trajectories highly overlap, locked in the geometric relationship necessary for catalysis. Mechanistically, Q60R provides entrance positive charge and stable hydrogen bond pairing to "pull" the substrate hydroxyl group; F147Y's aromatic ring and phenolic hydroxyl simultaneously contribute π–π stacking and hydrogen bonding, dual-positioning the Alkene chain and polar end; L125P rigidifies the local loop region, significantly reducing pocket breathing amplitude; A300D provides an additional polar anchor and potential salt bridge at the "gate" position, reducing random drifting of product/substrate. This result directly supports our screening principle: docking is for "finding direction," MD is for "testing steady state"; our final selections 54883/54456 are not only statically energetically favorable but also maintain a true catalytic conformation in a dynamic environment, which is the fundamental reason for their higher yield in wet lab experiments.
The dominant conformations during the dynamics simulation are shown in Fig. 12. The ligand is shown in blue. In 54883, blue represents the wild-type, and dark purple represents the mutant stick representation, showing good match. 54456 also binds at the correct position with a large binding free energy.
Fig. 12 Dominant conformations during molecular dynamics simulation
Using the above metrics, we evaluated the stability and ligand retention of each mutant complex during the simulation. Overall, the selected candidate mutants exhibited more stable enzyme -substrate binding in MD simulations compared to the wild-type. Notably, differences existed among candidates with high docking scores: for example, some mutants (IDs 64117, 63928) showed significant ligand displacement during prolonged simulation, indicating that the initial docked conformation may not be stable; while other mutants (54883 and 54456) kept the ligand firmly positioned at the correct binding site throughout the simulation. This result highlights the importance of introducing MD dynamics investigation: conformations with high static docking energy may not be ideal designs if they are difficult to maintain under dynamic conditions. In summary, MD simulation provided valuable dynamic information for identifying which mutation combination is more likely to maintain an efficient catalytic conformation in a real environment.
2.5 Energy Decomposition Analysis and Binding Free Energy Calculation
To quantitatively compare the binding strength between each mutant and the substrate, we further calculated the binding free energy using the equilibrated trajectories from MD simulations. The method used was the classic MM/PBSA (Molecular Mechanics-Poisson Boltzmann Surface Area) and MM/GBSA (Molecular Mechanics-Generalized Born Surface Area). Briefly, we sampled numerous complex conformation frames from the stable simulation trajectory, calculated the difference in molecular mechanics energy of the system before and after ligand binding for each frame, along with the change in solvation free energy, and took the average to obtain an approximate binding free energy ΔG_binding.
| GeDH | ΔG (kcal/mol) | ||||
|---|---|---|---|---|---|
| ΔGvdW | ΔGele | ΔGpolar | ΔGnonpolar | ΔGtotal | |
| 64117 | -18.95 | -5.36 | 11.04 | -2.99 | -16.26 |
| 63928 | -21.34 | -5.27 | 12.24 | -3.32 | -17.67 |
| 54883 | -19.28 | -18.82 | 16.11 | -3.28 | -25.27 |
| 54456 | -21.79 | -5.18 | 11.75 | -3.37 | -18.59 |
| WT | -18.80 | -2.35 | 8.67 | -2.86 | -15.35 |
Simultaneously, we decomposed the total binding free energy onto individual residues to evaluate the contribution of each amino acid to ligand binding (Fig. 13).
Fig. 13 Schematic of energy decomposition analysis, typically a bar chart showing per-residue energy contributions
This energy decomposition analysis helps deeply understand why the mutations work. The calculation results show differences in the binding free energy of different mutants, with trends consistent with molecular docking and dynamics observations. The average binding free energy of the wild-type GeDH-geraniol complex is relatively unfavorable, while the ΔG_binding of mutants is more negative, indicating more spontaneous and stable binding. For example, the total binding free energy of our optimal mutant (54883 combination) is -25.27 kcal/mol, a significant improvement compared to the wild-type's -15.35 kcal/mol (more negative value). This gap of approximately 10 kcal/mol shows the mutant's affinity for the substrate is expected to increase by several orders of magnitude (binding constant is exponentially sensitive to energy changes). At the residue level, we found that some conserved residues contribute greatly to the basic binding, such as aromatic F102 providing prominent van der Waals interactions; meanwhile, some sites where we introduced mutations showed significantly enhanced contributions in the best variants. For example, the hydrogen bonds and electrostatic interactions formed by arginine after the Q60R mutation showed a high proportion in the simulation, proving that this mutation strengthens enzyme-substrate binding. Additionally, the analysis showed that residues like T51 and D52, near the entrance of the active pocket, contribute importantly to complex stability in the best mutant. We believe these residues may help anchor the substrate molecule positioning. These details of energy decomposition provide molecular-level mechanistic insights, indicating that successful mutations indeed consolidate the enzyme -substrate complex by enhancing specific interactions, thereby lowering the system's free energy.
2.6 Results Analysis
Based on the above dry lab experimental results, we assessed the changes in substrate binding ability and structural stability of various GeDH mutants and identified potential optimal mutation schemes. Firstly, at the protein structure level, the introduced mutations did not disrupt the overall folding stability of the enzyme; on the contrary, the mutation selection guided by sequence evolution information ensured that the mutants still maintained good thermal stability. Secondly, at the substrate binding level, multiple mutations exhibited a synergistic gain effect: docking simulations showed that the quadruple mutant formed more favorable enzyme-substrate interactions compared to the wild-type, with generally better docking scores. However, static docking is only a preliminary indicator, and we further distinguished the actual performance of the mutants using molecular dynamics (MD) simulations. The simulations indicated that the mutant with the highest docking score was not necessarily the best in a dynamic environment. The Q60R, L125P, F147Y, and A300D quadruple mutant designed by us demonstrated excellent stability in MD: the geraniol remained correctly oriented in the mutant enzyme's pocket, continuously interacting with key residues, and showed no signs of dissociation. The binding free energy calculations for this mutant also supported its advantage, with a more negative ΔG_total value compared to other mutants and the wild-type, indicating the highest theoretical affinity. This suggests that the quadruple mutation significantly enhanced the ability of GeDH to capture and catalyze geraniol.
From a molecular mechanism perspective, the Q60R and F147Y mutations strengthened the interactions between the enzyme and the polar groups of the substrate by introducing additional hydrogen bonds/electrostatic interactions, while L125P likely stabilized the conformation near the active pocket, making the enzyme-substrate complex more resistant to dissociation. The introduction of A300D possibly enhanced salt bridge interactions within the protein or with the coenzyme, thereby improving the enzyme's overall catalytic efficiency. The synergistic effects of these mutations made the enzyme's active site "stick" to the geraniol substrate, lowering the activation energy required for catalysis and potentially increasing the reaction rate.
Dry lab simulation predictions clearly indicated that the combination of Q60R/L125P/F147Y/A300D was the most promising mutation for improving GeDH performance. Based on this, we constructed the mutant and used it for subsequent wet lab validation. The results showed that this rationally designed enzyme variant significantly increased the geranial yield in the final oxidation step (approximately 34% higher than the wild-type, see the wet lab for details.). This finding fully demonstrates the value of computer-assisted enzyme engineering: the best mutations selected through molecular simulation showed the expected improvements in experiments, providing strong support for the optimization of the entire geranial biosynthesis pathway.