Loading

L o a d i n g ,   p l e a s e   w a i t . . .

Model-Thermal-Stability-Modification

 Background and Exploration of Renovation Plan

In recent years, artificial intelligence has had an enormous influence in the biology field. Along with the impressive success of a series of AI-biological works, structural biology has also been profoundly changed. With the support of modern calculating hardware and advanced algorithms, enzyme modification has also been transformed and upgraded by deep learning. This part of our works, the enzyme modification based on the Marine 05PET enzyme[1], combined AI-facilitated computation pipline and following wet experiments aiming to cut down the time and reagent coast of conventional modification. After several rounds of efforts, the thermal stability modification has been ultimately accomplished.

First and foremost, regarding the improvement of computing equipment in recent years, we attempted to conduct a full single-point mutation trial of the 05PET enzyme. However, given that only predict 30 proteins accessible per account per day when ultilizing Google's official AlphaFold sever, we adopted ESMFold model [2] for reasoning on a T4 GPU to predict the structure of the single-point mutant proteins. Nevertheless, two obstacles made our original plan unpractical. The first was that the prediction accuracy of ESMFold was insufficient for the subsequent physical and chemical calculations. The second was that only approximately 400 protein structures could be predicted per day using a T4 GPU for inference. Therefore, we changed our enzyme modification strategy.

 Core Model Selection:
ProteinMPNN and ThermoMPNN

ProteinMPNN [3] is a message-passing network capable of predicting the original sequence of a protein based on its structure, while ThermoMPNN [4], derived from ProteinMPNN, predicts the change in thermal stability (more precisely, the free energy change, i. e. ddG) at each position after single-point mutations based on protein structure information, from which we can deduce the exact positions and mutant solutions. This has greatly addressed the issues of computation cost and structural prediction accuracy we encountered in our practice.

 Prediction, Screening, and Validation
of the First Round of Single-point Mutations

We first conducted reasoning and biophysical computational tests for single-point mutations, followed by prediction and computational tests for disulfide bonds, and then submitted them to the wet experiment for ultimate verification.

Through ThermoMPNN, we obtained the 20*257-shape results provided by the model. First, 17 mutations with ddG less than -1 kcal/mol were screened as the single-point mutations of primary concern:

PETase Mutation and Predicted ddG Data

Mutation predicted_ddG (kcal/mol) pos wtAA mutAA
D248R -2.5975 248 D R
D248K -2.1529 248 D K
D213R -2.054 213 D R
D213K -1.7337 213 D K
D248H -1.3534 248 D H
D248M -1.3419 248 D M
D242R -1.292 242 D R
D248Q -1.2844 248 D Q
T142L -1.2587 142 T L
T142I -1.2432 142 T I
D248N -1.2051 248 D N
D248L -1.2011 248 D L
D248A -1.1058 248 D A
D242I -1.0776 242 D I
S200R -1.0435 200 S R
T142M -1.0297 142 T M
D248I -1.0291 248 D I
Figure 1. Predicted ddG of full sequence mutation

Known to all, algorithm is not one hundred percent accurate, particularly in biological tasks. Thereby, in order to obtain more accurate and reliable data, we decided to implement conventional computation to warrant the effect.

We adopted AlphaFold3 [5] to predict the structure of these mutations, transformed the file format with OpenBabel, and finally ran the physicochemical software, FoldX [6], to perform Repair operation and calculate the folding energy. In order to calculate the folding energy, which could mirror the stability, in batches, we built high-throughput pipline using procedural operations . Thus, the following data is obtained:

Mutation and dG Data

Mutation dG (kcal/mol)
05PET -3.52
d213k -3.82
d213r -8.86
d242i -4.03
d242r -10.04
d248a -4.73
d248h -2.71
d248i -6.28
d248k -9.86
d248l -4.82
d248m -11.31
d248n -4.79
d248q -6.11
d248r -4.29
s200r -4.81
t142i -4.98
t142l -16.69
t142m -3.99

Considering the data above, the five mutations, d213r, d242r, d248m, s200r, and t142l, have relatively more absolute values of ddG. Accordingly, the next round of mutations, which conducted multi-point mutation, selected the targets from them and combined to ultimately obtain a multi-point mutation sequence with high stability.

Combination Mutation and dG Data

Mutation dG (kcal/mol)
d213r、d242r、d248m、s200r、t142l -8.56
t142l、d242r -14.34
t142l、d248m -9.30
t142l、d248m、d242r -10.79
d248m、d242r -12.78

 Expanded screening and validation of
second-round single-site mutations

Aiming at exporting more potential mutations, The experimental progress above was repeated while the search range was further expanded. The upper limit of ddG predicted by the model was raised to 0 with the single-point mutations at the above verified sites excluded, consequently adding 38 single-point mutations to our list of automatic pipline for structure prediction, file format conversion and folding energy prediction:

Mutation and dG Data (Additional Mutants)

Mutation dG (kcal/mol)
a28f 0.47
a28l -0.49
a28m -2.03
d234a -4.46
d234g 1.63
d234k -7.37
d234r -4.16
d234s -10.96
d246n 1.40
d250i -4.22
d250r -0.34
d250t -2.57
d250v -5.00
d250y -5.77
D4p -0.08
e68r -4.49
e68t -3.94
g197i -2.46
g197v -2.56
h123y -8.60
i150s -0.31
i150t -4.29
p207f 5.34
q101m -1.18
q146f -11.93
q146i -7.31
q146l -6.66
q146m -5.89
q146y -7.94
q159d -3.88
q159w -11.64
q159y 0.14
S20i -2.41
s231r 10.08
s64r 0.72
v208f -4.85
v208l -12.28
v208y -4.77

Five new and better single-point mutations were found:

Key Mutation and dG Data

Mutation dG (kcal/mol)
d234k -7.37
d234s -10.96
q146f -11.93
q159w -11.64
v208l -12.28

The fusion with the previous single-point mutation was further verified.

 Prediction and Verification of
Disulfide Bond Modification

Meanwhile, we used traditional computational methods for design the disulfide bonds to further enhance the thermal stability of the 05PET enzyme. Disulfide by Design 2.0 [7] is an algorithm that predicts potential disulfide bonds using protein structures. We ran Disulfide by Design 2.0 with the structure of the 05PET enzyme as input. The algorithm output possible disulfide bond mutations and their multiple paremeters.

Protein Residue Interaction Data

Res1 Chain Res1 Seq # Res1 AA Res2 Chain Res2 Seq # Res2 AA Chi3 Energy Sum B-Factors
A 15 THR A 237 MET 112.97 4.3 27.38
A 24 SER A 27 ALA 97.43 3.06 26.94
A 26 SER A 226 GLY -62.55 6.44 24.51
A 28 ASP A 82 SER 119.58 2.56 25.56
A 29 GLY A 82 SER 115.87 6.03 25.71
A 31 PHE A 47 PRO -99.63 6.22 27.02
A 38 ALA A 42 CYS -96 3.81 25.96
A 38 ALA A 44 VAL -108.93 4.22 26.64
A 45 PHE A 81 ALA 124.98 4.98 22.81
A 50 LEU A 122 LYS 114.75 6.36 30.87
A 58 PRO A 127 ARG 107.43 1.39 23.59
A 58 PRO A 236 LEU 102.69 3.32 23.14
A 62 TRP A 89 ALA -101.32 2.67 22.16
A 63 GLY A 95 ALA -62.52 6.22 23.05
A 63 GLY A 139 GLY -108.95 4.61 22.18
A 64 ASN A 68 ALA 102.6 3.11 30.74
A 65 GLY A 133 HIS -109.19 4.7 26.37
A 65 GLY A 134 SER 71.55 4.11 28.94
A 68 ALA A 72 THR 98.85 3.95 31.64
A 70 PRO A 89 ALA -97.59 0.87 25.53
A 76 ILE A 221 ALA 95.18 2.2 23.3
A 90 ALA A 100 ASP 95.99 1.04 22.98
A 95 ALA A 139 GLY 88.84 3.58 22.93
A 98 GLY A 143 ALA -75.66 5.59 23.39
A 111 GLN A 119 TYR 92.44 3.86 27.31
A 112 ASN A 123 LEU 122.64 2.68 27.89
A 131 ALA A 153 ALA 123.66 5.02 18.94
A 144 GLY A 152 THR -114.55 4.94 22.12
A 151 VAL A 235 HIS 125.32 6.18 23.75
A 155 PHE A 217 PRO 117.23 4.2 23.72
A 169 GLN A 199 ARG 97.71 1.79 31.9
A 178 LEU A 206 TRP 124.24 6.3 21.58
A 180 THR A 189 PRO -115.23 4.39 28.99
A 186 ILE A 214 HIS -91.88 2.57 28.27
A 203 PRO A 265 GLY -77.42 3.35 38.17
A 208 GLU A 260 ASP 121.42 8.18 26.52
A 209 LEU A 212 ALA 103.99 1.15 28.1
A 212 ALA A 216 GLU 122.04 4.35 28.96
A 230 ALA A 242 ALA 104.56 0.8 23.22
A 233 ARG A 239 ASP -93.12 1.84 24.57
A 263 ARG A 266 ILE 117.48 7.92 39.71

We screened based on the suggested threshold values of paremeters (Chi3∈[80°,100°] + Energy<5 + ΣB>30) and gain two disulfide bonds, 58ALA-62THR and 159GLN-189ARG. Then we used AlphFold 3 [5] for structural prediction to further verify the formability of two disulfide bonds.

Figure 2. First Disulfide Bond: 58ALA-62THR
Figure 3. Second Disulfide Bond: 159GLN-189ARG

Two disulfide bonds appeared simultaneously in one structure prediction result, implying the feasibility of its disulfide bond design.

 Multiple mutations and disulfide bond stability

Consequently, we throwed combinations of multiple single point mutations in the well-established pipline, and got the digital folding energy data of multipoint mutations:

Multi-Site Combination Mutation and dG Data

Mutation dG (kcal/mol)
q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C
q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234k_d248k -24.28
q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234s_d248k -13.43
q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234k_d248m -21.31
q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234s_d248m -17.15

 References


1. Chen, J., Jia, Y., Sun, Y. et al. Global marine microbial diversity and its potential in bioprospecting. Nature 633, 371–379 (2024). https://doi.org/10.1038/s41586-024-07891-2 ↑ back

2. Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):e2016239118. doi: 10.1073/pnas.2016239118 . PMID: 33876751; PMCID: PMC8053943. ↑ back

3. J. Dauparas et al. ,Robust deep learning–based protein sequence design using ProteinMPNN.Science378,49-56(2022). DOI:10.1126/science.add2187 ↑ back

4. H. Dieckhaus,M. Brocidiacono,N.Z. Randolph, & B. Kuhlman, Transfer learning to leverage larger datasets for improved prediction of protein stability changes, Proc. Natl. Acad. Sci. U.S.A. 121 (6) e2314853121, https://doi.org/10.1073/pnas.2314853121 (2024). ↑ back

5. Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w ↑ back

6. Buß O, Rudat J, Ochsenreither K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches? Comput Struct Biotechnol J. 2018 Feb 3;16:25-33. doi: 10.1016/j.csbj.2018.01.002 . PMID: 30275935; PMCID: PMC6158775. ↑ back

7. Craig, D.B., Dombkowski, A.A. Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins. BMC Bioinformatics 14, 346 (2013). https://doi.org/10.1186/1471-2105-14-346 ↑ back

 

Email copied! Paste into your email client: tjusls_2025china@163.com