Dry Lab | NPU-China-iGEM 2025

Rational Design and Development of a High-Performance Xylanase

Molecular Dynamics Simulation of Xylanase

To identify the highly flexible regions of Xylanase, Molecular dynamics simulations were performed for 50 ns at 300 K and 333 K. The root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) were calculated to evaluate conformational stability and residue flexibility.

As shown in Figure 1A, the RMSD profiles reveal that structural deviation increases with temperature, indicating that the xylanase structure exhibits relatively low stability at 333 K. The RMSF results in Figure 1B show that residues 33-35, 56-57, 88, 136-138, and 148-158 exhibited pronounced fluctuations, indicating flexible regions associated with relatively unstable structural elements.

Figure 1: Molecular dynamics simulation of xylanase at different temperatures

Virtual Saturation Mutagenesis

Virtual saturation mutagenesis was performed on the flexible regions of the protein identified in the molecular dynamics analysis. Simulations were conducted at 333 K and pH 7, and each mutation was modeled in triplicate to ensure reproducibility. The calculated changes in Gibbs free energy (ΔΔG) after mutagenesis are summarized in Figure 2.

Residues with ΔΔG < 0 are shown in red, where a darker red color corresponds to a smaller ΔΔG value and indicates a greater stabilizing effect of the mutation on the protein structure. In other words, mutations with lower ΔΔG values are predicted to enhance the structural stability of the enzyme.

Based on the evaluation of free energy changes, six mutations—D57Q, D57R, D57M, N88W, T156F, and T157D—were selected as the most promising candidates for improving protein stability.

Figure 2: Virtual saturation mutation of xylanase by FoldX

Construction of the Xylose-Inducible Promoter

Construction of the Recombinant Plasmid Containing the Xylose Operon

The gene sequences of xylR and xylO from Escherichia coli were obtained from the NCBI GenBank database (https://www.ncbi.nlm.nih.gov/). The recombinant plasmid design, including primer design for PCR amplification and vector construction, was carried out using SnapGene software (Insightful Science, USA).

Figure 3: Construction of pET-28a-xylR

Figure 4: Construction of pET-28a-xylR-xylO

The ultimately designed recombinant plasmid and its associated characteristics are presented in Figure 5:

Figure 5: Map of pET-28a-xylR-xylO

Furthermore, after verifying that the xylose-inducible promoter pET-28a-xylR-xylO derived from pET-28a functioned properly, we employed a similar strategy to construct a derivative vector, pPIC10K, based on the Pichia pastoris recombinant expression vector pPIC9K. The plasmid map is shown below (Figure 6). This construct serves as the final expression vector utilized in this study.

Figure 6: Map of pPIC10K

Gene Mining and Computer-Aided Engineering of Pro-Xylane Synthase

Gene Mining

Previous studies have demonstrated that Ribitol 2-Dehydrogenase (RDH) and Phosphite Dehydrogenase (PTDH), both of which possess well-defined catalytic functions and thoroughly characterized biochemical properties, can act as multi-enzyme catalysts for the biosynthesis of Pro-Xylane.

In this study, a homologous gene mining strategy was adopted using known functional enzymes as templates. To identify promising candidates, a multi-level screening framework was established. First, large-scale retrieval of homologous sequences was carried out from the NCBI database to ensure that the selected genes shared high sequence similarity with the templates, thereby preserving their structural scaffolds and catalytic mechanisms. Second, the soluble expression potential of each candidate was evaluated using bioinformatic tools such as Protein-Sol (https://protein-sol.manchester.ac.uk/), since solubility is essential for efficient heterologous expression and downstream application. Finally, multiple sequence alignment and conserved active-site analysis were performed to verify that all key residues involved in substrate recognition, cofactor binding, and catalysis were completely conserved without deleterious substitutions, ensuring the integrity of enzymatic function.

Through this rigorous screening process, we avoided empirical searching and selectively identified high-quality RDH and PTDH homologs with strong sequence conservation, favorable solubility, and intact catalytic motifs. These enzymes provide a solid molecular foundation for constructing an efficient in vitro enzymatic system for Pro-Xylane biosynthesis.

The amino acid sequences of the identified RDH homologs are shown below:

>RDH
MARELEGKVAAVTGAASGIGLASAEAMLAAGARVVMVDRDEAALKALCNKHGDTVIPLVVDLLDPEDCATLLPRVLEKACQLDILHANAGTYVGGDLVDADAIDRMLNLNVNVMKNVHDVLPHMIERRTGDIIVTSSLAAHFPTPWEPVYASSKWAINCFVQTVRRQVFKHGIRVGSISPGPVVSALLADWPPEKLKEARDSGSLLEASDVAEVVMFMLTRPRGMTIRDVLMLPTNFDL

The predicted solubility expression result of RDH is shown in Figure 7:

Figure 7: The predicted solubility expression result of RDH

The protein sequences obtained in PTDH are shown below:

>PTDH
MLPKLVITHRVHDEILQLLAPHCELMTNQTDSTLTREEILRRCRDAQAMMAFMPDRVDADFLQACPELRVVGCALKGFDNFDVDACTARGVWLTFVPDLLTVPTAELAIGLAVGLGRHLRAADAFVRSGEFQGWQPQFYGTGLDNATVGILGMGAIGLAMADRLQGWGATLQYHEAKALDTQTEQRLGLRQVACSELFASSDFILLALPLNADTQHLVNAELLALVRPGALLVNPCRGSVVDEAAVLAALERGQLGGYAADVFEMEDWARADRPRLIDPALLAHPNTLFTPHIGSAVRAVRLEIERCAAQNIIQVLAGARPINAANRLPKAEPAAC

The predicted solubility expression result of PTDH is shown in Figure 8:

Figure 8: The predicted solubility expression result of PTDH

The subsequent experimental results are presented in the Wet Lab section.

Future Optimization and Technical Advancement

Homology Modeling

Homology models of Ribitol 2-Dehydrogenase (RDH) and Phosphite Dehydrogenase (PTDH) were generated using the SWISS-MODEL online server (https://swissmodel.expasy.org/). The crystal structures with PDB IDs 5jo9 and 4ebf were selected as templates for RDH and PTDH, respectively.

Homology modeling predicts the three-dimensional structure of a target protein based on experimentally determined structures of homologous proteins. In such models, terminal regions—particularly the N-terminus—are often unresolved during crystallization, leading to relatively low structural reliability in those segments. Consequently, mutation prediction in these regions is considered unreliable, and our mutational analysis therefore excluded residues at the N-terminal ends. This is also reflected in the subsequent heatmap analysis, where the residue indices for RDH and PTDH do not start from position 1.

Figure 9: The three-dimensional structure of RDH

Figure 10: The three-dimensional structure of PTDH

Virtual Prediction and Analysis

After obtaining the base structural models, in silico saturation mutagenesis was carried out using FoldX to evaluate the effects of all possible point mutations on protein stability and potential functional changes.

The resulting ΔΔG values were used to construct heatmaps that visualize the impact of each amino acid substitution. In these heatmaps, color gradients represent predicted thermostability: regions shaded in deeper red correspond to mutations predicted to confer greater structural stability to the engineered proteins.

For RDH

After excluding active site residues, we identified that mutations at positions 11 and 87.

Figure 11: Virtual saturation mutation of RDH by FoldX

A11I, A11L, A11M or A11V - Isoleucine (I), leucine (L), methionine and valine have much larger side chains than alanine (A), which can better fill the cavities within proteins and form more extensive and tighter van der Waals forces with surrounding hydrophobic residues.

Figure 12: The three-dimensional structure at position 11

A87I or A87L - This mutation is highly similar to the mechanism at position 11, further confirming that the overall stability of the protein is dominated by the tightness of its hydrophobic core.

Figure 13: The three-dimensional structure at position 87

For PTDH

After excluding active site residues, we identified that mutations at positions 85, 296, and 322 of amino acid residues were most beneficial for enhancing thermostability.

Figure 14: Virtual saturation mutation of PTDH by FoldX

A85L - This may be related to the fact that alanine (A) has only a methyl group in its side chain, whereas leucine (L) possesses a larger isobutyl side chain. Mutating A to L is equivalent to inserting a larger 'spacer' into the hydrophobic core of the protein, thereby enhancing hydrophobic interactions.

Figure 15: The three-dimensional structure at position 85

A296D or A296E - may be related to the formation of new hydrogen bonds, may be optimized for optimizing local hydrophobic interactions.

Figure 16: The three-dimensional structure at position 296

I322F - may be optimized for π-π interactions.

Figure 17: The three-dimensional structure at position 322

The homology modeling results, together with the mutation sites predicted by in silico saturation mutagenesis, lay a theoretical foundation for the subsequent rational engineering of the enzyme to improve its thermal stability and catalytic efficiency.

Methods

Molecular Dynamics Simulation of Xylanase

Molecular dynamics (MD) simulations of xylanase were performed using the GROMACS software package. The initial xylanase structure was solvated in a cubic box using the TIP3P water model. Sodium (Na⁺) and chloride (Cl⁻) ions were added to neutralize the total charge and mimic the physiological ionic environment.

Energy minimization was conducted using the steepest descent algorithm to eliminate unfavorable steric contacts. The minimized system was equilibrated in two stages: first under the NVT ensemble (constant number of particles, volume, and temperature) for 100 ps at 300 K using a V-rescale thermostat, and subsequently under the NPT ensemble (constant number of particles, pressure, and temperature) for 100 ps at 1 bar using a Berendsen barostat, allowing system density to stabilize.

After equilibration, a 50-ns production simulation was carried out with a 2-fs integration time step. The LINCS algorithm was applied to constrain all bond lengths. Structural stability parameters, including root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), and radius of gyration (Rg), were calculated and analyzed using the built-in analysis tools of GROMACS.

Gene Mining

Validated enzymes RDH (UniProt Q89FN7) and PTDH (UniProt O69054) were used as templates. Homologs were retrieved by BLASTp against NCBI nr/UniProtKB (thresholds: E-value ≤ 1e−20, sequence coverage ≥ 75%, identity ≥ 30%). Hits were dereplicated with CD-HIT at 90% identity, and aligned to the templates to verify conservation of cofactor-binding and catalytic residues; non-conserved sequences were discarded. Signal peptides and transmembrane segments were evaluated with SignalP, and solubility was predicted using Protein-Sol and SoluProt; candidates lacking (or with removable) signal peptides, without multiple TM helices, and with favorable solubility scores were retained. Final candidates were ranked by homology, motif conservation, and predicted solubility, and top hits were selected for cloning and expression.

Gene Sequence Acquisition

The gene sequence of the target enzyme, xylanase, was retrieved from the UniProt Knowledgebase (https://www.uniprot.org/) and cross-verified in the NCBI database (https://www.ncbi.nlm.nih.gov/). The amino acid sequence and the corresponding coding DNA sequence (CDS) were downloaded from the protein entry page. These sequences served as input data for subsequent sequence alignment, homology modeling, and primer design.

Sequence Alignment of Enzymes

Sequence alignment was conducted using the BLAST (Basic Local Alignment Search Tool) program from NCBI. The amino acid sequence of xylanase obtained from UniProt was used as the query sequence against the non-redundant protein (nr) database under default parameters. Homologous sequences with high similarity to the query were identified and selected for further analysis.

Homology Modeling

Homology modeling of xylanase was performed using the SWISS-MODEL server (https://swissmodel.expasy.org/) (Waterhouse et al., Nucleic Acids Research, 2018, 46: W296-W303). The amino acid sequence was submitted to the workspace, and suitable structural templates were automatically selected. Model construction involved sequence-template alignment, backbone generation, and optimization of side chains and loop regions. Model quality was evaluated using the Global Model Quality Estimation (GMQE) and QMEAN scoring systems implemented in SWISS-MODEL to ensure structural reliability and accuracy.

Virtual Saturation Mutagenesis

Comprehensive in silico saturation mutagenesis was performed using FoldX 5.0 to evaluate the effects of all possible point mutations on enzyme stability. The homology-modeled structure was first energy-minimized with the RepairPDB command to correct steric clashes and optimize torsion angles.

Subsequently, the BuildModel command was used to systematically mutate each amino acid residue into all other 19 natural amino acids, generating single-point mutant models. For each mutant, FoldX calculated the change in folding free energy (ΔΔG). A negative ΔΔG indicates enhanced stability, whereas a positive value suggests destabilization.

All mutation results were ranked based on ΔΔG values, and potential stabilizing sites (ΔΔG < 0) were selected as candidates for experimental verification.

Primer Design

Primers were designed using SnapGene (Version 6.0.2; Insightful Science, USA) based on the selected beneficial mutation sites. In addition, two custom primer-design software tools were developed specifically for this project to improve automation and accuracy in mutagenesis primer generation; detailed descriptions and usage instructions are provided in the Wiki → Software section.

Heatmap Generation and Visualization

All heatmaps were generated using GraphPad Prism (Version 10.1.2). Numerical data obtained from molecular dynamics simulations and in silico saturation mutagenesis (e.g., RMSD, RMSF, and ΔΔG values) were imported into the software. Data normalization and visualization were performed using Prism's statistical and graphical modules. Continuous numerical data were mapped to color gradients, where warmer colors represented higher stability or increased favorable ΔΔG shifts, allowing intuitive visualization of stability patterns across mutations.

References

Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX (2015) 1, 19-25.
The UniProt Consortium. UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Research (2025) 53(D1), D560-D567.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology (1990) 215(3), 403-410.
Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F. T., de Beer, T. A. P., Rempfer, C., Bordoli, L., Lepore, R., & Schwede, T. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Research (2018) 46(W1), W296-W303.
Guerois, R., Nielsen, J. E., & Serrano, L. Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. Journal of Molecular Biology (2002) 320(2), 369-387.
Schymkowitz, J. W. H., Rousseau, F., Martins, I. C., Ferkinghoff-Borg, J., Stricher, F., & Serrano, L. The FoldX web server: An online force field. Nucleic Acids Research (2005) 33(Web Server issue), W382-W388.
SnapGene software (Version 6.0.2), Insightful Science, USA. Available at https://www.snapgene.com/.
GraphPad Prism (Version 10.1.2), GraphPad Software, San Diego, California, USA. Available at https://www.graphpad.com/.
Hebditch, M., Carballo-Amador, M. A., Charonis, S., Curtis, R., Warwicker, J. Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics (Oxford, England) (2017) 33(19), 3098-3100.