This study systematically elucidates the cross-scale mechanism of amide hydrolase (ADH)-catalyzed degradation of ochratoxin (OTA) by establishing a deep collaborative mechanism between experimental biology and computational modeling. The entire research system has constructed a closed-loop research paradigm of "experimental data-driven modeling → model-guided experimental design → experimental result feedback to optimize the model", realizing the organic connection from the optimization of macroscopic reaction conditions to the analysis of microscopic molecular mechanisms.
We have successfully established seamless connection at these two levels and formed a research closed loop:
This part is based on data provided by the experimental team and analyzed and processed by the modeling team.
To study the optimal conditions for enzymatic reactions, we first conducted single-factor control experiments using the control variable method, measured and established the quantitative relationship between five key environmental factors (temperature, pH, metal ions, acid-base tolerance, and thermal stability) and enzyme activity, and determined the optimal range and level of each factor, providing a basis for subsequent analysis of multi-factor conditions.
In the pre-experiment, the level range of each factor was determined to cover the characteristic interval of enzyme activity changes. Each group of experiments was set with three replicates, and the enzyme activity level was taken as the average value, expressed as relative enzyme activity, i.e., normalized with the maximum enzyme activity in each factor experiment as 100%, to draw single-factor effect curves and identify the enzyme activity peaks under each factor.
Enzyme activity shows a strong temperature dependence, and its change law presents a typical bell-shaped curve. The optimal reaction temperature of the enzyme is 45°C~50°C, at which the relative enzyme activity is the highest. When the temperature deviates from this optimal value, the enzyme activity decreases significantly, and the activity loss in the high-temperature region (>50°C) is particularly severe.
Three buffer systems (C-P, Tris, Glycine) were used to cover a wide range of pH 5.0~11.0. The enzyme showed a typical preference for alkalinity, with an optimal pH of 8.5 (determined in the Tris buffer system). Meanwhile, the buffer system itself had a significant impact on activity, and Tris buffer was identified as the optimal reaction medium for the enzyme.
The regulatory effects of different ion species on enzyme activity vary significantly. Li⁺ exhibits the strongest activating effect, with the relative enzyme activity reaching as high as 104.0%; while Fe³⁺ and Zn²⁺ completely inhibit the enzyme activity.
The enzyme solution was incubated under different pH conditions for 60 minutes, and the remaining activity was measured to evaluate its stability. As shown, the enzyme exhibits strong stability in neutral to weakly alkaline environments (pH 7.0-9.0), with the peak activity (100%) observed at pH 8.5. It completely loses activity under acidic conditions (pH ≤ 4.0), indicating that it cannot withstand strong acidity.
This characteristic was evaluated by measuring the residual activity of the enzyme after pretreatment at different temperatures. The activity retention rate of the enzyme decreases sharply as the treatment temperature increases. 20°C is its most stable condition (with 100% residual activity). When the treatment temperature reaches 30°C and 40°C, the residual activity drops to approximately 75% and 10% respectively. When the treatment temperature exceeds 50°C, the enzyme is completely inactivated.
Optimal conditions for each factor | Optimal range |
---|---|
Temperature: 45℃ (Enzyme activity: 100%) | 40~50℃ |
PH:8.5(Enzyme activity:100%) | 7.5~8.5 |
Metal ions:Li+(Enzyme activity:103.98%) | Li+ |
Acid-base tolerance:PH=8.5(Enzyme activity:100%) | 8.0~8.5 |
First, standardize the original experimental data to eliminate the difference in dimensions among various factors, ensuring the accuracy and comparability of subsequent analyses.
After preliminarily analyzing the influence of four environmental factors (temperature, pH, metal ions, and acid-base tolerance) on enzyme activity, we constructed a correlation coefficient matrix to evaluate the strength of association between variables. The specific analysis process is as follows:
The calculation of the correlation coefficient matrix uses standardized data. By quantifying the linear correlation degree between variables, it identifies key factor combinations affecting enzyme activity.
The analysis results show that there is a significant positive correlation between temperature and pH (r=0.82, p<0.05), indicating that they have a synergistic effect in affecting enzyme activity; the correlation coefficients between metal ions and other environmental parameters are all low (|r|<0.3), which indicates that their influence on enzyme activity is relatively independent of other environmental factors.
By calculating the explained variance and cumulative contribution rate of principal components, the cumulative contribution rate of the first two principal components is 87.48% > 85%, so the first two principal components are extracted. The principal component expressions are as follows:
Y1 = 1.0111Temperature + 0.9245pH + 0.8981Acid-base tolerance - 0.2845Metal ions
Y2 = 0.0998Temperature + 0.0031pH + 0.2015Acid-base tolerance + 1.0072Metal ions
In the first principal component, temperature and pH have a greater impact, indicating the influence of physical factors in the reaction environment; in the second principal component, metal ions have a greater impact, indicating that the catalytic effect of metal ions has a greater impact on enzyme activity.
By calculating the explained variance and cumulative contribution rate of the principal components, we obtained:
The cumulative contribution rate of the first two principal components is 87.48%, which exceeds the commonly used threshold of 85%, so they can fully represent the original data information. Based on the principal component loading matrix, we constructed the principal component expressions:
First principal component Y1:
Y1 = 1.0111×Temperature + 0.9245×pH + 0.8981×Acid-base tolerance − 0.2845×Metal ions
Loading analysis shows that temperature (1.0111), pH (0.9245), and acid-base tolerance (0.8981) all have large positive loadings on Y1, while metal ions have a small negative loading (-0.2845). This indicates that Y1 mainly comprehensively reflects the influence of the physical conditions of the reaction environment on enzyme activity.
Second principal component Y2:
Y2 = 0.0998×Temperature + 0.0031×pH + 0.2015×Acid-base tolerance + 1.0072×Metal ions
In Y2, metal ions (1.0072) show an extremely high positive loading, far higher than other factors, indicating that Y2 mainly represents the chemical factor of the catalytic effect of metal ions.
From the loading plot, it can be seen that pH, temperature, and acid-base tolerance environment have strong influences on the first principal component (Y1).
Through principal component analysis, we reduced the four environmental factors to two comprehensive indicators: environmental physical factor (Y1) and metal ion factor (Y2). This result confirms from a mathematical statistics perspective that enzyme activity is simultaneously regulated by the synergistic effects of physical environment and chemical factors.
It is worth noting that principal component analysis not only verifies the conclusions drawn from single-factor experiments but, more importantly, reveals the inherent structural relationships between factors: the three seemingly independent factors of temperature, pH, and acid-base tolerance actually share a common influence dimension statistically. This finding provides a key insight for us to construct a multi-factor response model --- in subsequent analyses, we should fully consider the interaction effects between factors rather than simply superimposing the results of single factors.
Based on the dominant factor structure clarified by principal component analysis, we will next carry out multi-factor simulation analysis to establish a prediction model that can more truly reflect the complex environment of enzymatic reactions.
With the existing single-factor data, we aim to use a small amount of single-factor optimization data to provide references for the subsequent formal multi-factor experimental design, thereby delimiting the scope of subsequent experiments and shortening the wet experiment process.
Based on a small amount of data on enzyme activity under different temperatures and pH values, we use the cubic spline interpolation method to create continuous functions for temperature and pH data, which can predict enzyme activity at any temperature and pH value.
Cubic spline interpolation does not use a single function to fit all data points; instead, it connects each interval between data points with an independent cubic polynomial, and adds strict continuity conditions at each data point to ensure that the final overall curve is smooth.
The following are the interpolation curves of the effects of temperature and pH on enzyme activity:
Since the influence of each factor cannot be precisely quantified with existing data, we attempt to quantify it using different hypothetical models. Here, we try three types of hypothetical models: multiplicative model, additive model, and minimum model. We predict using these different hypothetical models to obtain various prediction results, and all these results will serve as references for subsequent experiments.
1. Define the optimization equation
The objective function: Maximize the enzyme activity function.
max E = f(temperature, pH, metal ions)
Constraints:
2. Algorithm Workflow
The Limited-memory Broyden-Fletcher-Goldfarb-Shanno with Bound constraints (L-BFGS-B) is an efficient optimization algorithm, particularly suitable for large-scale, constrained optimization problems. It combines the advantages of the BFGS quasi-Newton method with limited memory technology and can handle variable boundary constraints. Its specific application process in the optimization of enzyme activity is as follows:
a) Initialization
b) Iteration Process
c) Convergence judgment
The algorithm stops when one of the following conditions is met:
Create a contour plot to intuitively show the effects of temperature and pH on enzyme activity:
The response surface contour plots for each metal ion used are as follows:
Based on the three hypothetical models proposed above, the combinations of influencing factors for the final maximum enzyme activity are obtained as follows:
Model 1: multiplicative | Model 2: additive | Model 3: min |
---|---|---|
Optimal temperature: 47.81°C | Optimal temperature: 47.50°C | Optimal temperature: 45.00°C |
Optimal pH: 8.39 | Optimal pH: 8.39 | Optimal pH: 8.50 |
Optimal metal ion: Li⁺ | Optimal metal ion: Li⁺ | Optimal metal ion: Li⁺ |
Predicted enzyme activity: 105.28% | Predicted enzyme activity: 101.76% | Predicted enzyme activity: 100.00% |
Through systematic experiments and modeling, we have determined the optimal reaction conditions for the enzyme and obtained the above optimal combinations. The optimization results of these macroscopic conditions reflect the structural characteristics of the enzyme molecule itself. To gain an in-depth understanding of its intrinsic mechanism and achieve precise modification of the enzyme, we will next establish a structure-activity relationship model between the enzyme's molecular structure and its functional properties.
The results were submitted to the experimental group, which verified the three sets of data predicted by the modeling group. Finally, it was concluded that Model 3 of the modeling group is consistent with the actual experimental situation, forming an experiment-modeling closed loop.
The modeling group assists the experimental group in modeling ADH, which is then submitted to the experimental group for subsequent experiments.
Perform homology modeling on the existing amidohydrolase (ADH) that can degrade ochratoxin (OTA).
In this study, Swiss-Model was used for homology modeling.
To evaluate the quality of the obtained model, we selected the predicted structure with a sequence homology of 78.82%. Then, we evaluated the results of homology modeling based on the GMQE value (Global Model Quality Estimation) and QMEAN value. The GMQE value ranges from 0 to 1, and the closer it is to 1, the better the modeling quality; the QMEAN value ranges from -4 to 0, and the closer it is to 0, the better the matching degree.
As shown in the figure, the GMQE value of Model 01 is 0.91, and the QMEAN value is 0.85 - 0.95, which is very close to 0.
The results of this homology modeling are good. Visualization operations in PyMol show that there are only a few amino acid differences between this protein sequence and the ADH sequence, and it is initially judged that the quality of this homology modeling is good. The homologous protein with the code 8YAK was found in the PDB database, which is the LlADH protein.
The modeling group assists the experimental group in performing molecular docking between LlADH and OTA to find the best modification sites, which are then submitted to the experimental group for comparison and subsequent experiments.
Further explore the mechanism of action of the protease in detoxifying OTA in this study.
AutoDock software was used for molecular docking analysis. Molecular docking technology can study the interaction between small molecule ligands and receptor biomolecules and predict their binding mode and affinity. The pdb file of the ligand ochratoxin (OTA) was collected from the PDB database. After calibration in PyMol, it was saved as a protein-ligand complex. The receptor and ligand were kept stationary, and rigid docking was performed using docking to directly calculate the binding energy for preliminary screening. The best docking score was around -5 kcal/mol, and the result was good. However, visualization in PyMol showed that the ligand OTA was still far away from the protein, so Autodock Vina was prepared for semi-flexible docking.
Taking the central metal ion Zn core as the docking site, a virtual box with a size of x=30, y=30, z=30 was constructed, and molecular docking was started. The best binding energy was -7.9 kcal/mol, and the docking result was relatively good. The distance between the ligand and nearby amino acids was calculated through visualization in PyMol, and most of them were within the range of 4 Å.
From the perspective of docking binding and structure, the structure obtained this time is good, and it is ready for molecular dynamics simulation. At the same time, the sites that may be used in this result are submitted to the experimental group for comparison.
The results of the experimental group show that: by comparing with the ADH3/OTA complex structure and combining with the analysis data of the modeling group, it is found that the substrate binding pocket of LlADH is mainly composed of amino acids S89, H164, V218, L219, H254, I326, V348, and 6 metal zinc-coordinating amino acids (H84, H86, H252, H272, K211, and N345). OTA forms hydrogen bond interactions with amino acids H164, L219, H252, H254, and β-metal. In addition, H86 forms a π-stacking interaction with the OTα part of OTA, and H254 and H272 form a T-stacking force with the L-β-phenylalanine part of OTA.
Continue to search for amino acid active sites to provide preliminary predictions for mutations, which can be used as a reference for the experimental group.
Molecular docking determines the best binding site and mode through energy matching, but it does not fully consider the conformational dynamic changes of the protein and ligand. Therefore, we need to clarify the interaction between LlADH and the receptor OTA.
In this study, a 200-ns molecular dynamics simulation was conducted. The temperature was set to 310K, the Amber force field was used for the protein, and the GAFF force field was used for the ligand.
As shown in Photo. Learn 1, the Root Mean Square Deviation (RMSD) is used to measure the average deviation of the structure from the initial structure during the simulation, which can help determine whether the structure is stable. The figures show the RMSD curves of the protein and the protein-ligand complex respectively. The fluctuation is slightly large in the early stage of the simulation, but then becomes stable and converges around 3 Å. This usually indicates that the system has reached a state of equilibrium. An RMSD less than 3 Å usually means that the protein molecule has adjusted from the initial conformation to a more stable state, and the binding degree between the protein and the ligand is also relatively stable.
As shown in Photo. Learn 2, the Root Mean Square Fluctuation (RMSF) is used to describe the positional fluctuation of atoms within a molecule or molecular system, and to evaluate the structural stability and flexibility of macromolecules such as proteins. It calculates the deviation of a certain atom from its average position during the entire simulation process, which can help us find functional amino acids and structural amino acids.
As shown in the figure, the amino acids at positions 56, 96, 173, 176, 220, 278, 313, and 347 have high peaks and large fluctuation degrees, indicating high flexibility. These amino acids and the amino acids near them may be active sites, providing preliminary predictions for mutations. Amino acids with low fluctuation degrees may contribute to the structural stability of the protein.
As shown in Photo. Learn 3, the Radius of Gyration (Rg) remains stable during the simulation, indicating that the protein does not undergo significant conformational changes during the reaction process. However, the overall Rg is around 2 Å, indicating that the protein structure is relatively compact.
As shown in Photo. Learn 4, the above figure shows hydrogen bonds. Generally speaking, the formation of a hydrogen bond requires that the distance between the hydrogen atom and the acceptor atom is less than 3.5 Å, and the angle between the hydrogen-acceptor-donor atoms is greater than 120°.
As shown in the figure, there is always a high number of hydrogen bonds during the simulation, indicating that the interaction force between the molecules of the protein-ligand complex is strong, and a more compact structure can be formed.
The free energy landscape visualizes the relative free energy (ΔG) of different conformations through contour lines and color gradients, helping us understand the energy minimum, transition state, and potential conformational transitions of the system.
Color bar and contour lines: ΔG (kJ/mol): The color gradient ranges from dark blue-purple (low free energy) to red (high free energy). Contour lines represent isofree energy levels, similar to contour lines in topographic maps. Dark blue regions are energy "basins" (minima), representing stable conformations; red regions are energy "barriers" or high-energy regions. The FEL is essentially a two-dimensional projection of the Gibbs Free Energy (ΔG) on the selected reaction coordinates (RMSD and radius of gyration). It is similar to a topographic map in geography: energy "basins" (minima) represent stable conformations, i.e., the native state of the protein, while energy "barriers" represent transition states, and high-energy "plateaus" correspond to unstable or rare conformations. The core formula of FEL is based on the Boltzmann distribution: ΔG = -kT ln(P), where P is the occurrence probability of the conformation in the simulation trajectory, k is the Boltzmann constant, and T is the temperature. FEL not only describes the static structure but also captures the dynamic process.
From the energy diagram in the figure, it can be seen that the dark blue region is the energy region with stable conformations in the system. This region indicates that the system has the lowest free energy under these conformations, showing high stability of the protein structure under these states. The center of the basin is roughly located at RMSD ≈ 0.15 - 0.20 nm and Rg ≈ 2.0 nm. The RMSD value means that this conformation is very close to the reference structure (which may be the native folded state), and Rg ≈ 2.0 nm indicates that the molecule is in a compact state without significant expansion.
Explain the selection of mutation sites from the perspective of intermolecular interactions, which can be used as theoretical support and supplement for the experimental group.
To further explore the interaction mechanism between LlADH and OTA and the main influencing forces, this study used the MM/PBSA method to calculate the binding free energy of the protein-ligand complex.
This free energy calculation combines the energy calculation of molecular mechanics (MM) with the implicit solvent model (Poisson-Boltzmann equation PB for electrostatic solvation, and surface area SA for non-polar contributions) to estimate the binding free energy of biomolecules (such as protein-ligand, protein-nucleic acid). Compared with strict but computationally expensive methods such as Free Energy Perturbation (FEP) or Thermodynamic Integration (TI), MM/PBSA has the advantages of computational efficiency and accuracy under limited sampling equilibrium.
After completing energy minimization and NVT and NPT equilibration in the molecular dynamics simulation stage, the calculation was performed using the gmx_MMPBSA script. The calculation results are shown in the figure below, from left to right: van der Waals force, electrostatic energy, polar solvation energy, non-polar solvation energy, molecular mechanics term, solvation energy term, and average binding free energy.
Energy analysis shows that the average binding free energy of the LlADH-OTA complex is -42.64 kcal/mol, among which the van der Waals force contributes the most. This indicates that the main force affecting intermolecular interactions is the van der Waals force.
In subsequent modifications, we need to introduce or optimize electrostatic interactions (especially hydrogen bonds and salt bridges) on the basis of maintaining the existing strong van der Waals contacts to achieve a synergistic effect and further improve binding affinity and specificity. The following suggestions are put forward:
Provide the experimental group with the key binding sites of LlADH and OTA, along with detailed theoretical analysis.
It can be clearly seen from the energy decomposition diagram that the total binding free energy (blue bar) is mainly driven by the van der Waals interaction (orange) and electrostatic interaction (green) as the main binding forces, while the polar solvation energy (purple) usually makes an unfavorable contribution, which is consistent with theoretical expectations. The residue decomposition total energy diagram intuitively shows the total contribution of each residue to the binding free energy. A negative value indicates a favorable contribution to binding, while a positive value indicates an unfavorable contribution.
The right figure shows that HIS:217 is the residue with the largest contribution (-4.22 kcal/mol), which is an undisputed binding "hotspot". From the analysis of the left figure, its significant contribution to binding mainly comes from the strong van der Waals interaction (orange part). This indicates that a part of the ligand forms a hydrophobic interaction with the HIS:217 residue with high shape complementarity and tight contact. This is the cornerstone of the binding of the current complex.
ASP:88 is another important contributor. Different from HIS:217, it can be seen from the left figure that the contribution of ASP:88 comes from considerable electrostatic interaction (green) and van der Waals interaction (orange). This indicates that there may be specific interactions such as hydrogen bonds or salt bridges between the ligand and ASP:88, along with good shape complementarity. ASP:88 is a key determinant residue for binding specificity.
There are also residues such as LEU:128, GLY:216, VAL:217, ILE:321, and VAL:347 that make significant favorable contributions. The contributions of these residues mainly come from van der Waals interactions, indicating that they together form a hydrophobic binding pocket that stably "encapsulates" the ligand.
However, HIS:85 has a slight unfavorable effect on binding (+0.521 kcal/mol). From the left figure, this unfavorable contribution is likely due to the unfavorable polar solvation energy (purple part). The reason may be that HIS:85 itself is a polar residue, and during the binding process, it or the polar groups around it are desolvated, but new hydrogen bonds sufficient to compensate for the energy loss cannot be formed with the ligand, resulting in a net energy loss.
Through the above description of key sites, the experimental group should avoid the above-mentioned sites when designing mutation sites to prevent damage to enzyme activity and reduction of catalytic efficiency.
Based on the sites calculated by the modeling group and combined with the experimental data of the experimental group, the results show that:
That is, with the assistance of the modeling group, the experimental group explored LlADH and carried out rational design and modification, and obtained Mutant I326A whose catalytic activity against OTA is 0.75 times higher than that of the wild-type LlADH.
This study has successfully constructed a research system integrating experiments and modeling in depth, and its core value lies in establishing a two-way interactive collaborative mechanism. At the methodological level, the work reflects the organic integration of multi-disciplinary methods: experimental biology provides real data, statistical modeling reveals patterns, computational structural biology explains mechanisms, and finally, protein engineering realizes functional verification and improvement.
The systematic nature of the research is reflected in four interrelated levels: first, single-factor experiments provide a data basis for multi-factor modeling; second, principal component analysis guides the design of response surface experiments; third, molecular simulation results provide targets for mutant design; fourth, mutation experiment results feed back to optimize the computational model. Each link reflects the iterative idea of "experiments drive modeling, and modeling guides experiments".
The innovation of the research lies in the establishment of an effective two-way verification mechanism: the binding mode predicted by molecular docking is verified by mutation experiments; the flexible regions identified by molecular dynamics simulation are tested by functional experiments; the energy contribution obtained by free energy calculation is confirmed by mutation effects. This "computation-experiment" cross-validation significantly improves the reliability and efficiency of the research.
At the theoretical level, the research not only reveals the interaction mechanism between ADH and OTA but also, more importantly, establishes a multi-scale correlation model between environmental factors, enzyme activity, structural characteristics, and energy changes. At the practical level, the obtained Mutant I326A and the proposed subsequent modification suggestions (such as optimizing the interaction near HIS-85) provide direct guidance for enzyme engineering applications.
The research paradigm combining experiments and modeling demonstrated in this study provides a replicable template for mechanism research and function optimization of complex biological systems. Future work can further deepen the iterative level, feed the mutation experiment data back to the machine learning model, realize more accurate prediction and design, and promote the rational design of enzymes to a higher level.
Due to the lack of experimental conditions, the following simulations are all hypothetical studies.
The expression levels of genes related to aflatoxin synthesis are mainly related to temperature and water activity. The former is related to storage conditions, climate, and other factors, while the latter is related to humidity, crop itself, and other factors. This study focuses on the impact of global warming caused by climate change on the risk of aflatoxin pollution, and therefore establishes a univariate model related to temperature for analysis.
Due to the significant rightward shift in the relationship between aflatoxin synthesis and temperature, a partial Gaussian distribution was chosen as the theoretical risk function, with 25 degrees Celsius as the maximum value. The following function
was obtained by selecting integers between 15 and 40 degrees Celsius and calculating their risk values to obtain the temperature weight of aflatoxin risk.
The temperature data is mainly based on TAS (temperature at 2 meters above ground) in the CMIP6 dataset, and the selected model is BCC-CSM2-MR. The path references are SSP245 and SSP585, where SSP245 represents moderate emissions and SSP585 represents high emissions.
The following is the result analysis:
The darker the color, the higher the risk.
Ssp245:
Regarding SSP585:
In this study, the risk values of various regions around the world were considered, and latitude and region were not weighted.
Compared with 2025, there is a noticeable orange color in high latitude regions, especially Siberia, Northeast China, and central Europe. Northeast China is the main grain producing area in China, while Europe is the main population settlement area. Therefore, the actual risk in 2070 is higher than that in 2025, but it has not been reflected in the global mean risk.