Rational Design and Development of a High-Performance Xylanase
Molecular Dynamics Simulation of Xylanase
To identify the highly flexible regions of Xylanase, Molecular dynamics simulations were performed for 50 ns at 300 K
and 333 K. The root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) were calculated to evaluate
conformational stability and residue flexibility.
As shown in Figure 1A, the RMSD profiles reveal that structural deviation increases with temperature, indicating that
the xylanase structure exhibits relatively low stability at 333 K. The RMSF results in Figure 1B show that residues
33-35, 56-57, 88, 136-138, and 148-158 exhibited pronounced fluctuations, indicating flexible regions associated with
relatively unstable structural elements.
Figure 1: Molecular dynamics simulation of xylanase at different temperatures
Virtual Saturation Mutagenesis
Virtual saturation mutagenesis was performed on the flexible regions of the protein
identified in the molecular dynamics analysis. Simulations were conducted at 333 K and pH 7, and each
mutation
was modeled in triplicate to ensure reproducibility. The calculated changes in Gibbs free energy (ΔΔG) after
mutagenesis are summarized in Figure 2.
Residues with ΔΔG < 0 are shown in red, where a darker red color corresponds to a
smaller ΔΔG value and indicates a greater stabilizing effect of the mutation on the protein structure. In
other words, mutations with lower ΔΔG values are predicted to enhance the structural stability of the
enzyme.
Based on the evaluation of free energy changes, six mutations—D57Q, D57R, D57M,
N88W,
T156F, and T157D—were selected as the most promising candidates for improving protein stability.
Figure 2: Virtual saturation mutation of xylanase by FoldX
Construction of the Xylose-Inducible Promoter
Construction of the Recombinant Plasmid Containing the Xylose Operon
The gene sequences of xylR and xylO from Escherichia coli
were obtained from the NCBI GenBank database (https://www.ncbi.nlm.nih.gov/). The recombinant plasmid
design,
including primer design for PCR amplification and vector construction, was carried out using SnapGene
software
(Insightful Science, USA).
Figure 3: Construction of pET-28a-xylR
Figure 4: Construction of pET-28a-xylR-xylO
The ultimately designed recombinant plasmid and its associated characteristics are
presented in Figure 5:
Figure 5: Map of pET-28a-xylR-xylO
Furthermore, after verifying that the xylose-inducible promoter
pET-28a-xylR-xylO derived from pET-28a functioned properly, we employed a similar strategy
to construct a derivative vector, pPIC10K, based on the Pichia pastoris recombinant expression vector
pPIC9K.
The plasmid map is shown below (Figure 6). This construct serves as the final expression vector utilized in
this study.
Figure 6: Map of pPIC10K
Gene Mining and Computer-Aided Engineering of Pro-Xylane Synthase
Gene Mining
Previous studies have demonstrated that Ribitol 2-Dehydrogenase (RDH) and Phosphite
Dehydrogenase (PTDH), both of which possess well-defined catalytic functions and thoroughly characterized
biochemical properties, can act as multi-enzyme catalysts for the biosynthesis of Pro-Xylane.
In this study, a homologous gene mining strategy was adopted using known functional
enzymes as templates. To identify promising candidates, a multi-level screening framework was established.
First, large-scale retrieval of homologous sequences was carried out from the NCBI database to ensure that
the
selected genes shared high sequence similarity with the templates, thereby preserving their structural
scaffolds and catalytic mechanisms. Second, the soluble expression potential of each candidate was evaluated
using bioinformatic tools such as Protein-Sol (https://protein-sol.manchester.ac.uk/), since solubility is
essential for efficient heterologous expression and downstream application. Finally, multiple sequence
alignment and conserved active-site analysis were performed to verify that all key residues involved in
substrate recognition, cofactor binding, and catalysis were completely conserved without deleterious
substitutions, ensuring the integrity of enzymatic function.
Through this rigorous screening process, we avoided empirical searching and
selectively identified high-quality RDH and PTDH homologs with strong sequence conservation, favorable
solubility, and intact catalytic motifs. These enzymes provide a solid molecular foundation for constructing
an efficient in vitro enzymatic system for Pro-Xylane biosynthesis.
The amino acid sequences of the identified RDH homologs are shown below:
>RDH
MARELEGKVAAVTGAASGIGLASAEAMLAAGARVVMVDRDEAALKALCNKHGDTVIPLVVDLLDPEDCATLLPRVLEKACQLDILHANAGTYVGGDLVDADAIDRMLNLNVNVMKNVHDVLPHMIERRTGDIIVTSSLAAHFPTPWEPVYASSKWAINCFVQTVRRQVFKHGIRVGSISPGPVVSALLADWPPEKLKEARDSGSLLEASDVAEVVMFMLTRPRGMTIRDVLMLPTNFDL
The predicted solubility expression result of RDH is shown in Figure
7:
Figure 7: The predicted solubility expression result of RDH
The protein sequences obtained in PTDH are shown below:
>PTDH
MLPKLVITHRVHDEILQLLAPHCELMTNQTDSTLTREEILRRCRDAQAMMAFMPDRVDADFLQACPELRVVGCALKGFDNFDVDACTARGVWLTFVPDLLTVPTAELAIGLAVGLGRHLRAADAFVRSGEFQGWQPQFYGTGLDNATVGILGMGAIGLAMADRLQGWGATLQYHEAKALDTQTEQRLGLRQVACSELFASSDFILLALPLNADTQHLVNAELLALVRPGALLVNPCRGSVVDEAAVLAALERGQLGGYAADVFEMEDWARADRPRLIDPALLAHPNTLFTPHIGSAVRAVRLEIERCAAQNIIQVLAGARPINAANRLPKAEPAAC
The predicted solubility expression result of PTDH is shown in Figure
8:
Figure 8: The predicted solubility expression result of PTDH
The subsequent experimental results are presented in the Wet Lab section.
Future Optimization and Technical Advancement
Homology Modeling
Homology models of Ribitol 2-Dehydrogenase (RDH) and Phosphite Dehydrogenase (PTDH)
were generated using the SWISS-MODEL online server (https://swissmodel.expasy.org/). The crystal structures
with PDB IDs 5jo9 and 4ebf were selected as templates for RDH and PTDH, respectively.
Homology modeling predicts the three-dimensional structure of a target protein based
on experimentally determined structures of homologous proteins. In such models, terminal
regions—particularly
the N-terminus—are often unresolved during crystallization, leading to relatively low structural reliability
in those segments. Consequently, mutation prediction in these regions is considered unreliable, and our
mutational analysis therefore excluded residues at the N-terminal ends. This is also reflected in the
subsequent heatmap analysis, where the residue indices for RDH and PTDH do not start from position 1.
Figure 9: The three-dimensional structure of RDH
Figure 10: The three-dimensional structure of PTDH
Virtual Prediction and Analysis
After obtaining the base structural models, in silico saturation
mutagenesis
was carried out using FoldX to evaluate the effects of all possible point mutations on
protein stability and potential functional changes.
The resulting ΔΔG values were used to construct heatmaps that visualize the impact
of
each amino acid substitution. In these heatmaps, color gradients represent predicted thermostability:
regions
shaded in deeper red correspond to mutations predicted to confer greater structural stability to the
engineered proteins.
For RDH
After excluding active site residues, we identified that mutations at positions 11
and
87.
Figure 11: Virtual saturation mutation of RDH by FoldX
A11I, A11L, A11M or A11V - Isoleucine (I),
leucine
(L), methionine
and valine have much larger side chains than alanine (A), which can better fill the
cavities within proteins
and form more extensive and tighter van der Waals forces with
surrounding
hydrophobic residues.
Figure 12: The three-dimensional structure at position 11
A87I or A87L - This mutation is highly
similar to the mechanism at
position 11, further confirming that the overall stability of the
protein is
dominated by the tightness of its
hydrophobic core.
Figure 13: The three-dimensional structure at position 87
For PTDH
After excluding active site residues, we identified that mutations at positions 85,
296, and 322 of amino acid residues were most beneficial for enhancing thermostability.
Figure 14: Virtual saturation mutation of PTDH by FoldX
A85L - This may be related to the fact that alanine (A) has only a
methyl group in its side chain, whereas leucine (L) possesses a larger isobutyl side chain. Mutating A to L
is
equivalent to inserting a larger 'spacer' into the hydrophobic core of the protein, thereby enhancing
hydrophobic interactions.
Figure 15: The three-dimensional structure at position 85
A296D or A296E - may be related to the formation of new hydrogen
bonds, may be optimized for optimizing local hydrophobic interactions.
Figure 16: The three-dimensional structure at position 296
I322F - may be optimized for π-π interactions.
Figure 17: The three-dimensional structure at position 322
The homology modeling results, together with the mutation sites predicted by in
silico saturation mutagenesis, lay a theoretical foundation for the subsequent rational engineering
of
the enzyme to improve its thermal stability and catalytic efficiency.
Methods
Molecular Dynamics Simulation of Xylanase
Molecular dynamics (MD) simulations of xylanase were performed using the GROMACS
software package. The initial xylanase structure was solvated in a cubic box using the TIP3P water model.
Sodium (Na⁺) and chloride (Cl⁻) ions were added to neutralize the total charge and mimic the physiological
ionic environment.
Energy minimization was conducted using the steepest descent algorithm to eliminate
unfavorable steric contacts. The minimized system was equilibrated in two stages: first under the NVT
ensemble
(constant number of particles, volume, and temperature) for 100 ps at 300 K using a V-rescale thermostat,
and
subsequently under the NPT ensemble (constant number of particles, pressure, and temperature) for 100 ps at
1
bar using a Berendsen barostat, allowing system density to stabilize.
After equilibration, a 50-ns production simulation was carried out with a 2-fs
integration time step. The LINCS algorithm was applied to constrain all bond lengths. Structural stability
parameters, including root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), and radius of
gyration (Rg), were calculated and analyzed using the built-in analysis tools of GROMACS.
Gene Mining
Validated enzymes RDH (UniProt Q89FN7) and PTDH (UniProt O69054)
were used as templates. Homologs were retrieved by BLASTp against NCBI nr/UniProtKB (thresholds: E-value ≤
1e−20, sequence coverage ≥ 75%, identity ≥ 30%). Hits were dereplicated with CD-HIT at 90% identity, and
aligned to the templates to verify conservation of cofactor-binding and catalytic residues; non-conserved
sequences were discarded. Signal peptides and transmembrane segments were evaluated with SignalP, and
solubility was predicted using Protein-Sol and SoluProt; candidates lacking (or with removable) signal
peptides, without multiple TM helices, and with favorable solubility scores were retained. Final candidates
were ranked by homology, motif conservation, and predicted solubility, and top hits were selected for
cloning
and expression.
Gene Sequence Acquisition
The gene sequence of the target enzyme, xylanase, was retrieved from the UniProt
Knowledgebase (https://www.uniprot.org/) and cross-verified in the NCBI database
(https://www.ncbi.nlm.nih.gov/). The amino acid sequence and the corresponding coding DNA sequence (CDS)
were
downloaded from the protein entry page. These sequences served as input data for subsequent sequence
alignment, homology modeling, and primer design.
Sequence Alignment of Enzymes
Sequence alignment was conducted using the BLAST (Basic Local Alignment Search Tool)
program from NCBI. The amino acid sequence of xylanase obtained from UniProt was used as the query sequence
against the non-redundant protein (nr) database under default parameters. Homologous sequences with high
similarity to the query were identified and selected for further analysis.
Homology Modeling
Homology modeling of xylanase was performed using the SWISS-MODEL server
(https://swissmodel.expasy.org/) (Waterhouse et al., Nucleic Acids Research, 2018, 46: W296-W303). The amino
acid sequence was submitted to the workspace, and suitable structural templates were automatically selected.
Model construction involved sequence-template alignment, backbone generation, and optimization of side
chains
and loop regions. Model quality was evaluated using the Global Model Quality Estimation (GMQE) and QMEAN
scoring systems implemented in SWISS-MODEL to ensure structural reliability and accuracy.
Virtual Saturation Mutagenesis
Comprehensive in silico saturation mutagenesis was performed using FoldX
5.0
to evaluate the effects of all possible point mutations on enzyme stability. The homology-modeled structure
was first energy-minimized with the RepairPDB command to correct steric clashes and optimize torsion angles.
Subsequently, the BuildModel command was used to systematically mutate each amino
acid
residue into all other 19 natural amino acids, generating single-point mutant models. For each mutant, FoldX
calculated the change in folding free energy (ΔΔG). A negative ΔΔG indicates enhanced stability, whereas a
positive value suggests destabilization.
All mutation results were ranked based on ΔΔG values, and potential stabilizing
sites
(ΔΔG < 0) were selected as candidates for experimental verification.
Primer Design
Primers were designed using SnapGene (Version 6.0.2; Insightful Science, USA) based
on
the selected beneficial mutation sites. In addition, two custom primer-design software tools were developed
specifically for this project to improve automation and accuracy in mutagenesis primer generation; detailed
descriptions and usage instructions are provided in the Wiki → Software section.
Heatmap Generation and Visualization
All heatmaps were generated using GraphPad Prism (Version 10.1.2). Numerical data
obtained from molecular dynamics simulations and in silico saturation mutagenesis (e.g., RMSD,
RMSF,
and ΔΔG values) were imported into the software. Data normalization and visualization were performed using
Prism's statistical and graphical modules. Continuous numerical data were mapped to color gradients, where
warmer colors represented higher stability or increased favorable ΔΔG shifts, allowing intuitive
visualization
of stability patterns across mutations.
References
-
Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess,
B.,
& Lindahl, E.
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers.
SoftwareX
(2015)
1, 19-25.
-
The UniProt Consortium.
UniProt: The universal protein knowledgebase in 2025.
Nucleic Acids Research
(2025)
53(D1), D560-D567.
-
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D.
J.
Basic local alignment search tool.
Journal of Molecular Biology
(1990)
215(3), 403-410.
-
Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G.,
Gumienny, R., Heer, F. T., de Beer, T. A. P., Rempfer, C., Bordoli, L., Lepore, R., & Schwede,
T.
SWISS-MODEL: Homology modelling of protein structures and
complexes.
Nucleic Acids Research
(2018)
46(W1), W296-W303.
-
Guerois, R., Nielsen, J. E., & Serrano, L.
Predicting changes in the stability of proteins and protein complexes: A
study of more than 1000 mutations.
Journal of Molecular Biology
(2002)
320(2), 369-387.
-
Schymkowitz, J. W. H., Rousseau, F., Martins, I. C., Ferkinghoff-Borg,
J.,
Stricher, F., & Serrano, L.
The FoldX web server: An online force field.
Nucleic Acids Research
(2005)
33(Web Server issue), W382-W388.
-
SnapGene software (Version 6.0.2), Insightful Science, USA.
Available at https://www.snapgene.com/.
-
GraphPad Prism (Version 10.1.2), GraphPad Software, San Diego,
California,
USA.
Available at https://www.graphpad.com/.
-
Hebditch, M., Carballo-Amador, M. A., Charonis, S., Curtis, R.,
Warwicker,
J.
Protein-Sol: a web tool for predicting protein solubility from
sequence.
Bioinformatics (Oxford, England)
(2017)
33(19), 3098-3100.