Contents

Modeling

"Modeling feeds back to experiments, and experiments guide modeling"

This study systematically elucidates the cross-scale mechanism of amide hydrolase (ADH)-catalyzed degradation of ochratoxin (OTA) by establishing a deep collaborative mechanism between experimental biology and computational modeling. The entire research system has constructed a closed-loop research paradigm of "experimental data-driven modeling → model-guided experimental design → experimental result feedback to optimize the model", realizing the organic connection from the optimization of macroscopic reaction conditions to the analysis of microscopic molecular mechanisms.

Interaction at the level of macroscopic condition optimization: Dialogue between statistical modeling and experiments

Experiment-driven modeling: The experimental team provided raw data on the relationship between factors such as temperature, pH, and metal ions and enzyme activity through rigorous single-factor control experiments. This is the starting point and cornerstone of all modeling work; without such high-quality data, any model would be like water without a source.
Modeling feeds back to experiments: The modeling team conducted principal component analysis (PCA) on the data and discovered a pattern that is difficult to intuitively perceive with the naked eye - there is a significant positive correlation between temperature and pH value (r=0.82, p<0.05). This statistical conclusion derived from the model fed back a key insight to the experimental team: in subsequent optimizations, temperature and pH must be considered as synergistic factors rather than independent variables.
Modeling guides experiments: Based on the guidance of PCA, the modeling team constructed a multi-factor response surface model and predicted the theoretical optimal conditions. These predicted values (such as temperature 47.81°C, pH 8.39) were directly delivered to the experimental team for verification experiments.
Experiments guide modeling: After verification, the experimental team found that the prediction of the minimum model was most consistent with the actual results. This experimental result, in turn, guided the modeling team to revise the mechanistic assumptions about how various factors affect enzyme activity - that is, enzyme activity is limited by the current worst environmental factor, rather than a simple superposition of the effects of each factor. This provided a key basis for the subsequent construction of a more accurate mechanistic model.

Interaction at the level of microscopic mechanism exploration: Dialogue between structural modeling and functional experiments

Experiment-driven modeling: The experimental team provided the amino acid sequence of the target enzyme, which is the only input for the modeling team to perform homology modeling to obtain the three-dimensional structure.
Modeling feeds back to experiments: The modeling team predicted the substrate OTA binding pocket and key residues (such as S89, H164, V218, etc.) through molecular docking, and pointed out the active regions with high flexibility that may affect function (such as positions 56, 96, 173) through RMSF analysis of molecular dynamics simulations. These calculation results were fed back to the experimental team in the form of a "list of potential key sites", directly guiding the experimental team in designing a mutant library, avoiding blind mutation of all sites, and greatly saving experimental resources and time.
Modeling guides experiments: MM/PBSA binding free energy calculation pushed the interaction to the quantitative level of energy. The modeling team not only pointed out which residues are important but also accurately calculated that HIS-217 is the largest favorable contributor (-4.22 kcal/mol), while HIS-85 has an unfavorable interaction (+0.521 kcal/mol). This provided the experimental team with an unprecedentedly precise modification plan: consolidate the interaction of HIS-217, or perform neutralization/reverse optimization mutation on HIS-85.
Experiments guide modeling: The experimental team carried out site-directed mutations according to the above guidance, and the results showed that the activity of the mutant I326A increased by 75%, while the L219 series mutations completely failed. These experimental results are the most authoritative judges:
- They verified the accuracy of the computational predictions (such as the importance of I326 hydrophobic interaction);
- They exposed the limitations of the current computational model (such as failure to fully predict the impact of L219 mutations).
These functional experimental data, in turn, guided the modeling team to optimize its calculation parameters and force fields, making the next prediction more reliable. For example, the quantitative data of mutant activity can be used to correct the correlation between calculated binding energy and actual enzyme activity, improving the predictive ability of the model.

Sublimation at the level of research paradigm: Closed-loop iteration and spiral ascent

We have successfully established seamless connection at these two levels and formed a research closed loop:

Macroscopic optimization provides the best context for microscopic research: The optimal reaction conditions (temperature, pH) determined by the response surface model are directly used as environmental parameters for calculations such as molecular dynamics simulation (for example, the temperature is set to 310K), ensuring that microscopic simulations are carried out under the most relevant physiological conditions and making the calculation results more biologically meaningful.
Microscopic mechanisms provide molecular explanations for macroscopic phenomena: The results of molecular simulations on enzyme-substrate binding modes, residue functions, dynamic stability, etc., explain at the atomic level why enzymes exhibit specific optimal temperatures, optimal pH, and thermal inactivation characteristics at the macroscopic level. For example, the sharp loss of enzyme activity at high temperatures can be intuitively manifested as the drastic unfolding and instability of the protein structure in MD simulations.

Part 1: Modeling for the selection of optimal conditions for enzymatic reactions

Single-factor optimization conditions

This part is based on data provided by the experimental team and analyzed and processed by the modeling team.

To study the optimal conditions for enzymatic reactions, we first conducted single-factor control experiments using the control variable method, measured and established the quantitative relationship between five key environmental factors (temperature, pH, metal ions, acid-base tolerance, and thermal stability) and enzyme activity, and determined the optimal range and level of each factor, providing a basis for subsequent analysis of multi-factor conditions.

In the pre-experiment, the level range of each factor was determined to cover the characteristic interval of enzyme activity changes. Each group of experiments was set with three replicates, and the enzyme activity level was taken as the average value, expressed as relative enzyme activity, i.e., normalized with the maximum enzyme activity in each factor experiment as 100%, to draw single-factor effect curves and identify the enzyme activity peaks under each factor.

Step 1: Temperature

Enzyme activity shows a strong temperature dependence, and its change law presents a typical bell-shaped curve. The optimal reaction temperature of the enzyme is 45°C~50°C, at which the relative enzyme activity is the highest. When the temperature deviates from this optimal value, the enzyme activity decreases significantly, and the activity loss in the high-temperature region (>50°C) is particularly severe.

Step 2: PH Value

Three buffer systems (C-P, Tris, Glycine) were used to cover a wide range of pH 5.0~11.0. The enzyme showed a typical preference for alkalinity, with an optimal pH of 8.5 (determined in the Tris buffer system). Meanwhile, the buffer system itself had a significant impact on activity, and Tris buffer was identified as the optimal reaction medium for the enzyme.

Step 3: Different Metal Ions

The regulatory effects of different ion species on enzyme activity vary significantly. Li⁺ exhibits the strongest activating effect, with the relative enzyme activity reaching as high as 104.0%; while Fe³⁺ and Zn²⁺ completely inhibit the enzyme activity.

Step 4: Acid-Base Tolerance

The enzyme solution was incubated under different pH conditions for 60 minutes, and the remaining activity was measured to evaluate its stability. As shown, the enzyme exhibits strong stability in neutral to weakly alkaline environments (pH 7.0-9.0), with the peak activity (100%) observed at pH 8.5. It completely loses activity under acidic conditions (pH ≤ 4.0), indicating that it cannot withstand strong acidity.

Step 5: Thermal Stability

This characteristic was evaluated by measuring the residual activity of the enzyme after pretreatment at different temperatures. The activity retention rate of the enzyme decreases sharply as the treatment temperature increases. 20°C is its most stable condition (with 100% residual activity). When the treatment temperature reaches 30°C and 40°C, the residual activity drops to approximately 75% and 10% respectively. When the treatment temperature exceeds 50°C, the enzyme is completely inactivated.

Step 6: Optimal Conditions Under Single Factor

Optimal conditions for each factor	Optimal range
Temperature: 45℃ (Enzyme activity: 100%)	40~50℃
PH：8.5（Enzyme activity：100%）	7.5~8.5
Metal ions：Li+(Enzyme activity：103.98%）	Li+
Acid-base tolerance：PH=8.5（Enzyme activity：100%）	8.0~8.5

Principal component analysis

Build

Step 1: Data standardization

First, standardize the original experimental data to eliminate the difference in dimensions among various factors, ensuring the accuracy and comparability of subsequent analyses.

Test

Step 2: Calculate the correlation coefficient matrix

After preliminarily analyzing the influence of four environmental factors (temperature, pH, metal ions, and acid-base tolerance) on enzyme activity, we constructed a correlation coefficient matrix to evaluate the strength of association between variables. The specific analysis process is as follows:

The calculation of the correlation coefficient matrix uses standardized data. By quantifying the linear correlation degree between variables, it identifies key factor combinations affecting enzyme activity.

The analysis results show that there is a significant positive correlation between temperature and pH (r=0.82, p<0.05), indicating that they have a synergistic effect in affecting enzyme activity; the correlation coefficients between metal ions and other environmental parameters are all low (|r|<0.3), which indicates that their influence on enzyme activity is relatively independent of other environmental factors.

Test

Step 3: Extract the number of principal components

By calculating the explained variance and cumulative contribution rate of principal components, the cumulative contribution rate of the first two principal components is 87.48% > 85%, so the first two principal components are extracted. The principal component expressions are as follows:

Y1 = 1.0111Temperature + 0.9245pH + 0.8981Acid-base tolerance - 0.2845Metal ions

Y2 = 0.0998Temperature + 0.0031pH + 0.2015Acid-base tolerance + 1.0072Metal ions

In the first principal component, temperature and pH have a greater impact, indicating the influence of physical factors in the reaction environment; in the second principal component, metal ions have a greater impact, indicating that the catalytic effect of metal ions has a greater impact on enzyme activity.

By calculating the explained variance and cumulative contribution rate of the principal components, we obtained:

The first principal component (PC1) explains 63.37% of the variance.
The second principal component (PC2) explains 24.11% of the variance.
Cumulative explained variance: 87.48%.

The cumulative contribution rate of the first two principal components is 87.48%, which exceeds the commonly used threshold of 85%, so they can fully represent the original data information. Based on the principal component loading matrix, we constructed the principal component expressions:

First principal component Y1:

Y1 = 1.0111×Temperature + 0.9245×pH + 0.8981×Acid-base tolerance − 0.2845×Metal ions

Loading analysis shows that temperature (1.0111), pH (0.9245), and acid-base tolerance (0.8981) all have large positive loadings on Y1, while metal ions have a small negative loading (-0.2845). This indicates that Y1 mainly comprehensively reflects the influence of the physical conditions of the reaction environment on enzyme activity.

Second principal component Y2:

Y2 = 0.0998×Temperature + 0.0031×pH + 0.2015×Acid-base tolerance + 1.0072×Metal ions

In Y2, metal ions (1.0072) show an extremely high positive loading, far higher than other factors, indicating that Y2 mainly represents the chemical factor of the catalytic effect of metal ions.

Learn

From the loading plot, it can be seen that pH, temperature, and acid-base tolerance environment have strong influences on the first principal component (Y1).

Through principal component analysis, we reduced the four environmental factors to two comprehensive indicators: environmental physical factor (Y1) and metal ion factor (Y2). This result confirms from a mathematical statistics perspective that enzyme activity is simultaneously regulated by the synergistic effects of physical environment and chemical factors.

It is worth noting that principal component analysis not only verifies the conclusions drawn from single-factor experiments but, more importantly, reveals the inherent structural relationships between factors: the three seemingly independent factors of temperature, pH, and acid-base tolerance actually share a common influence dimension statistically. This finding provides a key insight for us to construct a multi-factor response model --- in subsequent analyses, we should fully consider the interaction effects between factors rather than simply superimposing the results of single factors.

Based on the dominant factor structure clarified by principal component analysis, we will next carry out multi-factor simulation analysis to establish a prediction model that can more truly reflect the complex environment of enzymatic reactions.

Multi-factor Analysis (Response Surface Methodology)

With the existing single-factor data, we aim to use a small amount of single-factor optimization data to provide references for the subsequent formal multi-factor experimental design, thereby delimiting the scope of subsequent experiments and shortening the wet experiment process.

Design

Step 1: Data Interpolation

Based on a small amount of data on enzyme activity under different temperatures and pH values, we use the cubic spline interpolation method to create continuous functions for temperature and pH data, which can predict enzyme activity at any temperature and pH value.

Cubic spline interpolation does not use a single function to fit all data points; instead, it connects each interval between data points with an independent cubic polynomial, and adds strict continuity conditions at each data point to ensure that the final overall curve is smooth.

The following are the interpolation curves of the effects of temperature and pH on enzyme activity:

Build

Step 2: The establishment of hypothetical models:

Since the influence of each factor cannot be precisely quantified with existing data, we attempt to quantify it using different hypothetical models. Here, we try three types of hypothetical models: multiplicative model, additive model, and minimum model. We predict using these different hypothetical models to obtain various prediction results, and all these results will serve as references for subsequent experiments.

Multiplicative model: It is assumed that the influences of various factors are multiplicative (the most commonly used, and most consistent with biological systems).
E = (T/100) × (P/100) × (I/103.9808) × 100
Where: T = enzyme activity under specific temperature, P = enzyme activity under specific pH, I = enzyme activity under specific metal ion concentration.
Normalization factors: maximum value of temperature = 100, maximum value of pH = 100, maximum value of metal ion = 103.9808.
Additive model: It is assumed that the influences of various factors are additive.
E = [(T/100) + (P/100) + (I/103.9808)] / 3 × 100
Where: T = enzyme activity under specific temperature, P = enzyme activity under specific pH, I = enzyme activity under specific metal ion concentration.
Minimum model: It is assumed that enzyme activity is limited by the worst factor.
E = min(T, P, I)
Where: T = enzyme activity under specific temperature, P = enzyme activity under specific pH, I = enzyme activity under specific metal ion concentration.

Test

Step 3: Optimization solution

1. Define the optimization equation

The objective function: Maximize the enzyme activity function.

max E = f(temperature, pH, metal ions)

Constraints:

2. Algorithm Workflow

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Bound constraints (L-BFGS-B) is an efficient optimization algorithm, particularly suitable for large-scale, constrained optimization problems. It combines the advantages of the BFGS quasi-Newton method with limited memory technology and can handle variable boundary constraints. Its specific application process in the optimization of enzyme activity is as follows:

a) Initialization

Start from the initial guess point: [45, 8.5, optimal metal ion index]
Set variable boundaries: temperature [0, 80], pH [5, 11], metal ion index [0, 10]

b) Iteration Process

Gradient Estimation: L-BFGS-B uses the finite difference method to estimate the gradient of the objective function
Calculation of the objective function gradient:
For a multivariate function f(x₁, x₂, ..., xₙ), its gradient is a vector. Each component of this vector is the first partial derivative of the function with respect to the corresponding independent variable.

To simplify the calculation, we use the finite difference method to approximate the gradient of the objective function.

Hessian approximation: Maintaining an approximation of the Hessian matrix using limited storage space
The L-BFGS-B algorithm updates the search direction by constructing an approximation of the objective function's Hessian matrix, so we first need to derive this approximation.
The Hessian matrix is a square matrix composed of the second-order partial derivatives of a multivariate function. It describes the local curvature of the function at a given point. For a scalar-valued function f(x_1, x_2, ..., x_n) with n variables, its Hessian matrix H(f) is an n \times n matrix, where the element in the i -th row and j -th column is the second-order partial derivative of the function with respect to the i -th and j -th variables:

sₖ = xₖ₊₁ - xₖ
yₖ = ∇f(xₖ₊₁) - ∇f(xₖ)
Search direction: Calculate the quasi-Newton direction
Δx = -Hₖ∇f(xₖ)
Line search: Perform a constrained line search along the search direction to ensure no violation of boundary constraints
Update parameters: Update parameters according to the found optimal step size
xₖ₊₁ = xₖ + αΔx

c) Convergence judgment

The algorithm stops when one of the following conditions is met:

The gradient norm is less than the tolerance
The parameter change is less than the tolerance
The function value change is less than the tolerance
The maximum number of iterations is reached

Learn

Step 4: Optimal result visualization:

Create a contour plot to intuitively show the effects of temperature and pH on enzyme activity:

The response surface contour plots for each metal ion used are as follows:

Step 5: Optimization Results Based on Different Hypothetical Models

Based on the three hypothetical models proposed above, the combinations of influencing factors for the final maximum enzyme activity are obtained as follows:

Model 1: multiplicative	Model 2: additive	Model 3: min
Optimal temperature: 47.81°C	Optimal temperature: 47.50°C	Optimal temperature: 45.00°C
Optimal pH: 8.39	Optimal pH: 8.39	Optimal pH: 8.50
Optimal metal ion: Li⁺	Optimal metal ion: Li⁺	Optimal metal ion: Li⁺
Predicted enzyme activity: 105.28%	Predicted enzyme activity: 101.76%	Predicted enzyme activity: 100.00%

Through systematic experiments and modeling, we have determined the optimal reaction conditions for the enzyme and obtained the above optimal combinations. The optimization results of these macroscopic conditions reflect the structural characteristics of the enzyme molecule itself. To gain an in-depth understanding of its intrinsic mechanism and achieve precise modification of the enzyme, we will next establish a structure-activity relationship model between the enzyme's molecular structure and its functional properties.

The results were submitted to the experimental group, which verified the three sets of data predicted by the modeling group. Finally, it was concluded that Model 3 of the modeling group is consistent with the actual experimental situation, forming an experiment-modeling closed loop.

Part 2: Biological Modeling Part

1. Homology Modeling of ADH

The modeling group assists the experimental group in modeling ADH, which is then submitted to the experimental group for subsequent experiments.

Design:

Perform homology modeling on the existing amidohydrolase (ADH) that can degrade ochratoxin (OTA).

Build:

In this study, Swiss-Model was used for homology modeling.

Test:

To evaluate the quality of the obtained model, we selected the predicted structure with a sequence homology of 78.82%. Then, we evaluated the results of homology modeling based on the GMQE value (Global Model Quality Estimation) and QMEAN value. The GMQE value ranges from 0 to 1, and the closer it is to 1, the better the modeling quality; the QMEAN value ranges from -4 to 0, and the closer it is to 0, the better the matching degree.

As shown in the figure, the GMQE value of Model 01 is 0.91, and the QMEAN value is 0.85 - 0.95, which is very close to 0.

Learn:

The results of this homology modeling are good. Visualization operations in PyMol show that there are only a few amino acid differences between this protein sequence and the ADH sequence, and it is initially judged that the quality of this homology modeling is good. The homologous protein with the code 8YAK was found in the PDB database, which is the LlADH protein.

2. Molecular Docking

The modeling group assists the experimental group in performing molecular docking between LlADH and OTA to find the best modification sites, which are then submitted to the experimental group for comparison and subsequent experiments.

Design:

Further explore the mechanism of action of the protease in detoxifying OTA in this study.

Build:

AutoDock software was used for molecular docking analysis. Molecular docking technology can study the interaction between small molecule ligands and receptor biomolecules and predict their binding mode and affinity. The pdb file of the ligand ochratoxin (OTA) was collected from the PDB database. After calibration in PyMol, it was saved as a protein-ligand complex. The receptor and ligand were kept stationary, and rigid docking was performed using docking to directly calculate the binding energy for preliminary screening. The best docking score was around -5 kcal/mol, and the result was good. However, visualization in PyMol showed that the ligand OTA was still far away from the protein, so Autodock Vina was prepared for semi-flexible docking.

Test:

Taking the central metal ion Zn core as the docking site, a virtual box with a size of x=30, y=30, z=30 was constructed, and molecular docking was started. The best binding energy was -7.9 kcal/mol, and the docking result was relatively good. The distance between the ligand and nearby amino acids was calculated through visualization in PyMol, and most of them were within the range of 4 Å.

Learn:

From the perspective of docking binding and structure, the structure obtained this time is good, and it is ready for molecular dynamics simulation. At the same time, the sites that may be used in this result are submitted to the experimental group for comparison.

The results of the experimental group show that: by comparing with the ADH3/OTA complex structure and combining with the analysis data of the modeling group, it is found that the substrate binding pocket of LlADH is mainly composed of amino acids S89, H164, V218, L219, H254, I326, V348, and 6 metal zinc-coordinating amino acids (H84, H86, H252, H272, K211, and N345). OTA forms hydrogen bond interactions with amino acids H164, L219, H252, H254, and β-metal. In addition, H86 forms a π-stacking interaction with the OTα part of OTA, and H254 and H272 form a T-stacking force with the L-β-phenylalanine part of OTA.

3. Molecular Dynamics Simulation Study

Continue to search for amino acid active sites to provide preliminary predictions for mutations, which can be used as a reference for the experimental group.

Design:

Molecular docking determines the best binding site and mode through energy matching, but it does not fully consider the conformational dynamic changes of the protein and ligand. Therefore, we need to clarify the interaction between LlADH and the receptor OTA.

Build & Test:

In this study, a 200-ns molecular dynamics simulation was conducted. The temperature was set to 310K, the Amber force field was used for the protein, and the GAFF force field was used for the ligand.

Learn 1: The LlADH-OTA Binding System Has Reached a State of Equilibrium with Stable Binding Degree

As shown in Photo. Learn 1, the Root Mean Square Deviation (RMSD) is used to measure the average deviation of the structure from the initial structure during the simulation, which can help determine whether the structure is stable. The figures show the RMSD curves of the protein and the protein-ligand complex respectively. The fluctuation is slightly large in the early stage of the simulation, but then becomes stable and converges around 3 Å. This usually indicates that the system has reached a state of equilibrium. An RMSD less than 3 Å usually means that the protein molecule has adjusted from the initial conformation to a more stable state, and the binding degree between the protein and the ligand is also relatively stable.

Learn 2: Identify Structural Amino Acids and Functional Amino Acids

As shown in Photo. Learn 2, the Root Mean Square Fluctuation (RMSF) is used to describe the positional fluctuation of atoms within a molecule or molecular system, and to evaluate the structural stability and flexibility of macromolecules such as proteins. It calculates the deviation of a certain atom from its average position during the entire simulation process, which can help us find functional amino acids and structural amino acids.

As shown in the figure, the amino acids at positions 56, 96, 173, 176, 220, 278, 313, and 347 have high peaks and large fluctuation degrees, indicating high flexibility. These amino acids and the amino acids near them may be active sites, providing preliminary predictions for mutations. Amino acids with low fluctuation degrees may contribute to the structural stability of the protein.

Learn 3: Confirm the Compactness of the Protein Structure

As shown in Photo. Learn 3, the Radius of Gyration (Rg) remains stable during the simulation, indicating that the protein does not undergo significant conformational changes during the reaction process. However, the overall Rg is around 2 Å, indicating that the protein structure is relatively compact.

Learn 4: Confirm the Compactness of the Protein Structure

As shown in Photo. Learn 4, the above figure shows hydrogen bonds. Generally speaking, the formation of a hydrogen bond requires that the distance between the hydrogen atom and the acceptor atom is less than 3.5 Å, and the angle between the hydrogen-acceptor-donor atoms is greater than 120°.

As shown in the figure, there is always a high number of hydrogen bonds during the simulation, indicating that the interaction force between the molecules of the protein-ligand complex is strong, and a more compact structure can be formed.

Learn 5: Confirm the Energy Region Maintaining Protein Stability

The free energy landscape visualizes the relative free energy (ΔG) of different conformations through contour lines and color gradients, helping us understand the energy minimum, transition state, and potential conformational transitions of the system.

Color bar and contour lines: ΔG (kJ/mol): The color gradient ranges from dark blue-purple (low free energy) to red (high free energy). Contour lines represent isofree energy levels, similar to contour lines in topographic maps. Dark blue regions are energy "basins" (minima), representing stable conformations; red regions are energy "barriers" or high-energy regions. The FEL is essentially a two-dimensional projection of the Gibbs Free Energy (ΔG) on the selected reaction coordinates (RMSD and radius of gyration). It is similar to a topographic map in geography: energy "basins" (minima) represent stable conformations, i.e., the native state of the protein, while energy "barriers" represent transition states, and high-energy "plateaus" correspond to unstable or rare conformations. The core formula of FEL is based on the Boltzmann distribution: ΔG = -kT ln(P), where P is the occurrence probability of the conformation in the simulation trajectory, k is the Boltzmann constant, and T is the temperature. FEL not only describes the static structure but also captures the dynamic process.

From the energy diagram in the figure, it can be seen that the dark blue region is the energy region with stable conformations in the system. This region indicates that the system has the lowest free energy under these conformations, showing high stability of the protein structure under these states. The center of the basin is roughly located at RMSD ≈ 0.15 - 0.20 nm and Rg ≈ 2.0 nm. The RMSD value means that this conformation is very close to the reference structure (which may be the native folded state), and Rg ≈ 2.0 nm indicates that the molecule is in a compact state without significant expansion.

4. Analysis of Intermolecular Interactions

Explain the selection of mutation sites from the perspective of intermolecular interactions, which can be used as theoretical support and supplement for the experimental group.

Design:

To further explore the interaction mechanism between LlADH and OTA and the main influencing forces, this study used the MM/PBSA method to calculate the binding free energy of the protein-ligand complex.

Build:

This free energy calculation combines the energy calculation of molecular mechanics (MM) with the implicit solvent model (Poisson-Boltzmann equation PB for electrostatic solvation, and surface area SA for non-polar contributions) to estimate the binding free energy of biomolecules (such as protein-ligand, protein-nucleic acid). Compared with strict but computationally expensive methods such as Free Energy Perturbation (FEP) or Thermodynamic Integration (TI), MM/PBSA has the advantages of computational efficiency and accuracy under limited sampling equilibrium.

Test:

After completing energy minimization and NVT and NPT equilibration in the molecular dynamics simulation stage, the calculation was performed using the gmx_MMPBSA script. The calculation results are shown in the figure below, from left to right: van der Waals force, electrostatic energy, polar solvation energy, non-polar solvation energy, molecular mechanics term, solvation energy term, and average binding free energy.

Learn:

Energy analysis shows that the average binding free energy of the LlADH-OTA complex is -42.64 kcal/mol, among which the van der Waals force contributes the most. This indicates that the main force affecting intermolecular interactions is the van der Waals force.

In subsequent modifications, we need to introduce or optimize electrostatic interactions (especially hydrogen bonds and salt bridges) on the basis of maintaining the existing strong van der Waals contacts to achieve a synergistic effect and further improve binding affinity and specificity. The following suggestions are put forward:

5. Residue Energy Decomposition

Provide the experimental group with the key binding sites of LlADH and OTA, along with detailed theoretical analysis.

It can be clearly seen from the energy decomposition diagram that the total binding free energy (blue bar) is mainly driven by the van der Waals interaction (orange) and electrostatic interaction (green) as the main binding forces, while the polar solvation energy (purple) usually makes an unfavorable contribution, which is consistent with theoretical expectations. The residue decomposition total energy diagram intuitively shows the total contribution of each residue to the binding free energy. A negative value indicates a favorable contribution to binding, while a positive value indicates an unfavorable contribution.

The right figure shows that HIS:217 is the residue with the largest contribution (-4.22 kcal/mol), which is an undisputed binding "hotspot". From the analysis of the left figure, its significant contribution to binding mainly comes from the strong van der Waals interaction (orange part). This indicates that a part of the ligand forms a hydrophobic interaction with the HIS:217 residue with high shape complementarity and tight contact. This is the cornerstone of the binding of the current complex.

ASP:88 is another important contributor. Different from HIS:217, it can be seen from the left figure that the contribution of ASP:88 comes from considerable electrostatic interaction (green) and van der Waals interaction (orange). This indicates that there may be specific interactions such as hydrogen bonds or salt bridges between the ligand and ASP:88, along with good shape complementarity. ASP:88 is a key determinant residue for binding specificity.

There are also residues such as LEU:128, GLY:216, VAL:217, ILE:321, and VAL:347 that make significant favorable contributions. The contributions of these residues mainly come from van der Waals interactions, indicating that they together form a hydrophobic binding pocket that stably "encapsulates" the ligand.

However, HIS:85 has a slight unfavorable effect on binding (+0.521 kcal/mol). From the left figure, this unfavorable contribution is likely due to the unfavorable polar solvation energy (purple part). The reason may be that HIS:85 itself is a polar residue, and during the binding process, it or the polar groups around it are desolvated, but new hydrogen bonds sufficient to compensate for the energy loss cannot be formed with the ligand, resulting in a net energy loss.

Through the above description of key sites, the experimental group should avoid the above-mentioned sites when designing mutation sites to prevent damage to enzyme activity and reduction of catalytic efficiency.

Based on the sites calculated by the modeling group and combined with the experimental data of the experimental group, the results show that:

The enzyme activity of the L219 series mutations (L219H, L219K, and L219R) almost completely disappears, which may be due to the polar side chains or large side chains of His, Lys, and Arg hindering the binding and entry of OTA;
The G131 series mutations (G131F and G131L) were not successfully purified;
The catalytic activity of the V348 series mutations (V348F, V348H, and V348S) almost disappears, indicating that the hydrophobic interaction of residue V348 is necessary for the LlADH-catalyzed hydrolysis of OTA;
The catalytic activity of the S89 series mutations (S89D, S89F, S89H, S89K, S89R, and S91R) is not improved, indicating that the charges carried by residues S89 and S91 are necessary for the LlADH-catalyzed hydrolysis of OTA;
The catalytic activity of Mutant I326A is 75% higher than that of LlADH, which may be due to the smaller side chain of Ala, which provides a wider space for the binding of OTA and LlADH.

That is, with the assistance of the modeling group, the experimental group explored LlADH and carried out rational design and modification, and obtained Mutant I326A whose catalytic activity against OTA is 0.75 times higher than that of the wild-type LlADH.

The modification suggestions for the later experiments are mainly in three aspects:

Try to optimize the interaction near HIS:85 to maximize the improvement of ligand affinity. It is suggested to introduce appropriate functional groups in the region of the ligand corresponding to HIS:85 and try to form hydrogen bonds with HIS:85 to offset its unfavorable polar solvation penalty. For example, introduce a hydrogen bond acceptor or donor on the ligand molecule to form a weak hydrogen bond network with the side chain or backbone of HIS:85.
Secondly, consolidate the electrostatic interaction with ASP:88: Check whether the geometric configuration of the hydrogen bond/salt bridge formed between the current ligand and ASP:88 is ideal, and enhance the electrostatic contribution.
Finally, modifications can be made on other contributing residues to expand van der Waals contacts. Observe whether there is still space around residues with large contributions such as GLY:216 and VAL:347. It may be possible to introduce small hydrophobic groups (such as methyl groups, fluorine atoms) on the ligand to increase van der Waals contacts with these residues.

Conclusion

This study has successfully constructed a research system integrating experiments and modeling in depth, and its core value lies in establishing a two-way interactive collaborative mechanism. At the methodological level, the work reflects the organic integration of multi-disciplinary methods: experimental biology provides real data, statistical modeling reveals patterns, computational structural biology explains mechanisms, and finally, protein engineering realizes functional verification and improvement.

The systematic nature of the research is reflected in four interrelated levels: first, single-factor experiments provide a data basis for multi-factor modeling; second, principal component analysis guides the design of response surface experiments; third, molecular simulation results provide targets for mutant design; fourth, mutation experiment results feed back to optimize the computational model. Each link reflects the iterative idea of "experiments drive modeling, and modeling guides experiments".

The innovation of the research lies in the establishment of an effective two-way verification mechanism: the binding mode predicted by molecular docking is verified by mutation experiments; the flexible regions identified by molecular dynamics simulation are tested by functional experiments; the energy contribution obtained by free energy calculation is confirmed by mutation effects. This "computation-experiment" cross-validation significantly improves the reliability and efficiency of the research.

At the theoretical level, the research not only reveals the interaction mechanism between ADH and OTA but also, more importantly, establishes a multi-scale correlation model between environmental factors, enzyme activity, structural characteristics, and energy changes. At the practical level, the obtained Mutant I326A and the proposed subsequent modification suggestions (such as optimizing the interaction near HIS-85) provide direct guidance for enzyme engineering applications.

The research paradigm combining experiments and modeling demonstrated in this study provides a replicable template for mechanism research and function optimization of complex biological systems. Future work can further deepen the iterative level, feed the mutation experiment data back to the machine learning model, realize more accurate prediction and design, and promote the rational design of enzymes to a higher level.

Climate prediction model

Due to the lack of experimental conditions, the following simulations are all hypothetical studies.

The expression levels of genes related to aflatoxin synthesis are mainly related to temperature and water activity. The former is related to storage conditions, climate, and other factors, while the latter is related to humidity, crop itself, and other factors. This study focuses on the impact of global warming caused by climate change on the risk of aflatoxin pollution, and therefore establishes a univariate model related to temperature for analysis.

Due to the significant rightward shift in the relationship between aflatoxin synthesis and temperature, a partial Gaussian distribution was chosen as the theoretical risk function, with 25 degrees Celsius as the maximum value. The following function

was obtained by selecting integers between 15 and 40 degrees Celsius and calculating their risk values to obtain the temperature weight of aflatoxin risk.

The temperature data is mainly based on TAS (temperature at 2 meters above ground) in the CMIP6 dataset, and the selected model is BCC-CSM2-MR. The path references are SSP245 and SSP585, where SSP245 represents moderate emissions and SSP585 represents high emissions.

The following is the result analysis:

The darker the color, the higher the risk.

Ssp245:

Regarding SSP585:

In this study, the risk values of various regions around the world were considered, and latitude and region were not weighted.

Compared with 2025, there is a noticeable orange color in high latitude regions, especially Siberia, Northeast China, and central Europe. Northeast China is the main grain producing area in China, while Europe is the main population settlement area. Therefore, the actual risk in 2070 is higher than that in 2025, but it has not been reflected in the global mean risk.