Background and Exploration of Renovation Plan
In recent years, artificial intelligence has had an enormous influence in the biology field. Along with the impressive success of a series of AI-biological works, structural biology has also been profoundly changed. With the support of modern calculating hardware and advanced algorithms, enzyme modification has also been transformed and upgraded by deep learning. This part of our works, the enzyme modification based on the Marine 05PET enzyme[1], combined AI-facilitated computation pipline and following wet experiments aiming to cut down the time and reagent coast of conventional modification. After several rounds of efforts, the thermal stability modification has been ultimately accomplished.
First and foremost, regarding the improvement of computing equipment in recent years, we attempted to conduct a full single-point mutation trial of the 05PET enzyme. However, given that only predict 30 proteins accessible per account per day when ultilizing Google's official AlphaFold sever, we adopted ESMFold model [2] for reasoning on a T4 GPU to predict the structure of the single-point mutant proteins. Nevertheless, two obstacles made our original plan unpractical. The first was that the prediction accuracy of ESMFold was insufficient for the subsequent physical and chemical calculations. The second was that only approximately 400 protein structures could be predicted per day using a T4 GPU for inference. Therefore, we changed our enzyme modification strategy.
Core Model Selection:
ProteinMPNN and
ThermoMPNN
ProteinMPNN [3] is a message-passing network capable of predicting the original sequence of a protein based on its structure, while ThermoMPNN [4], derived from ProteinMPNN, predicts the change in thermal stability (more precisely, the free energy change, i. e. ddG) at each position after single-point mutations based on protein structure information, from which we can deduce the exact positions and mutant solutions. This has greatly addressed the issues of computation cost and structural prediction accuracy we encountered in our practice.
Prediction, Screening, and Validation
of the First Round of Single-point Mutations
We first conducted reasoning and biophysical computational tests for single-point mutations, followed by prediction and computational tests for disulfide bonds, and then submitted them to the wet experiment for ultimate verification.
Through ThermoMPNN, we obtained the 20*257-shape results provided by the model. First, 17 mutations with ddG less than -1 kcal/mol were screened as the single-point mutations of primary concern:
PETase Mutation and Predicted ddG Data
| Mutation | predicted_ddG (kcal/mol) | pos | wtAA | mutAA |
|---|---|---|---|---|
| D248R | -2.5975 | 248 | D | R |
| D248K | -2.1529 | 248 | D | K |
| D213R | -2.054 | 213 | D | R |
| D213K | -1.7337 | 213 | D | K |
| D248H | -1.3534 | 248 | D | H |
| D248M | -1.3419 | 248 | D | M |
| D242R | -1.292 | 242 | D | R |
| D248Q | -1.2844 | 248 | D | Q |
| T142L | -1.2587 | 142 | T | L |
| T142I | -1.2432 | 142 | T | I |
| D248N | -1.2051 | 248 | D | N |
| D248L | -1.2011 | 248 | D | L |
| D248A | -1.1058 | 248 | D | A |
| D242I | -1.0776 | 242 | D | I |
| S200R | -1.0435 | 200 | S | R |
| T142M | -1.0297 | 142 | T | M |
| D248I | -1.0291 | 248 | D | I |
Known to all, algorithm is not one hundred percent accurate, particularly in biological tasks. Thereby, in order to obtain more accurate and reliable data, we decided to implement conventional computation to warrant the effect.
We adopted AlphaFold3 [5] to predict the structure of these mutations, transformed the file format with OpenBabel, and finally ran the physicochemical software, FoldX [6], to perform Repair operation and calculate the folding energy. In order to calculate the folding energy, which could mirror the stability, in batches, we built high-throughput pipline using procedural operations . Thus, the following data is obtained:
Mutation and dG Data
| Mutation | dG (kcal/mol) |
|---|---|
| 05PET | -3.52 |
| d213k | -3.82 |
| d213r | -8.86 |
| d242i | -4.03 |
| d242r | -10.04 |
| d248a | -4.73 |
| d248h | -2.71 |
| d248i | -6.28 |
| d248k | -9.86 |
| d248l | -4.82 |
| d248m | -11.31 |
| d248n | -4.79 |
| d248q | -6.11 |
| d248r | -4.29 |
| s200r | -4.81 |
| t142i | -4.98 |
| t142l | -16.69 |
| t142m | -3.99 |
Considering the data above, the five mutations, d213r, d242r, d248m, s200r, and t142l, have relatively more absolute values of ddG. Accordingly, the next round of mutations, which conducted multi-point mutation, selected the targets from them and combined to ultimately obtain a multi-point mutation sequence with high stability.
Combination Mutation and dG Data
| Mutation | dG (kcal/mol) |
|---|---|
| d213r、d242r、d248m、s200r、t142l | -8.56 |
| t142l、d242r | -14.34 |
| t142l、d248m | -9.30 |
| t142l、d248m、d242r | -10.79 |
| d248m、d242r | -12.78 |
Expanded screening and validation of
second-round single-site mutations
Aiming at exporting more potential mutations, The experimental progress above was repeated while the search range was further expanded. The upper limit of ddG predicted by the model was raised to 0 with the single-point mutations at the above verified sites excluded, consequently adding 38 single-point mutations to our list of automatic pipline for structure prediction, file format conversion and folding energy prediction:
Mutation and dG Data (Additional Mutants)
| Mutation | dG (kcal/mol) |
|---|---|
| a28f | 0.47 |
| a28l | -0.49 |
| a28m | -2.03 |
| d234a | -4.46 |
| d234g | 1.63 |
| d234k | -7.37 |
| d234r | -4.16 |
| d234s | -10.96 |
| d246n | 1.40 |
| d250i | -4.22 |
| d250r | -0.34 |
| d250t | -2.57 |
| d250v | -5.00 |
| d250y | -5.77 |
| D4p | -0.08 |
| e68r | -4.49 |
| e68t | -3.94 |
| g197i | -2.46 |
| g197v | -2.56 |
| h123y | -8.60 |
| i150s | -0.31 |
| i150t | -4.29 |
| p207f | 5.34 |
| q101m | -1.18 |
| q146f | -11.93 |
| q146i | -7.31 |
| q146l | -6.66 |
| q146m | -5.89 |
| q146y | -7.94 |
| q159d | -3.88 |
| q159w | -11.64 |
| q159y | 0.14 |
| S20i | -2.41 |
| s231r | 10.08 |
| s64r | 0.72 |
| v208f | -4.85 |
| v208l | -12.28 |
| v208y | -4.77 |
Five new and better single-point mutations were found:
Key Mutation and dG Data
| Mutation | dG (kcal/mol) |
|---|---|
| d234k | -7.37 |
| d234s | -10.96 |
| q146f | -11.93 |
| q159w | -11.64 |
| v208l | -12.28 |
The fusion with the previous single-point mutation was further verified.
Prediction and Verification of
Disulfide Bond Modification
Meanwhile, we used traditional computational methods for design the disulfide bonds to further enhance the thermal stability of the 05PET enzyme. Disulfide by Design 2.0 [7] is an algorithm that predicts potential disulfide bonds using protein structures. We ran Disulfide by Design 2.0 with the structure of the 05PET enzyme as input. The algorithm output possible disulfide bond mutations and their multiple paremeters.
Protein Residue Interaction Data
| Res1 Chain | Res1 Seq # | Res1 AA | Res2 Chain | Res2 Seq # | Res2 AA | Chi3 | Energy | Sum B-Factors |
|---|---|---|---|---|---|---|---|---|
| A | 15 | THR | A | 237 | MET | 112.97 | 4.3 | 27.38 |
| A | 24 | SER | A | 27 | ALA | 97.43 | 3.06 | 26.94 |
| A | 26 | SER | A | 226 | GLY | -62.55 | 6.44 | 24.51 |
| A | 28 | ASP | A | 82 | SER | 119.58 | 2.56 | 25.56 |
| A | 29 | GLY | A | 82 | SER | 115.87 | 6.03 | 25.71 |
| A | 31 | PHE | A | 47 | PRO | -99.63 | 6.22 | 27.02 |
| A | 38 | ALA | A | 42 | CYS | -96 | 3.81 | 25.96 |
| A | 38 | ALA | A | 44 | VAL | -108.93 | 4.22 | 26.64 |
| A | 45 | PHE | A | 81 | ALA | 124.98 | 4.98 | 22.81 |
| A | 50 | LEU | A | 122 | LYS | 114.75 | 6.36 | 30.87 |
| A | 58 | PRO | A | 127 | ARG | 107.43 | 1.39 | 23.59 |
| A | 58 | PRO | A | 236 | LEU | 102.69 | 3.32 | 23.14 |
| A | 62 | TRP | A | 89 | ALA | -101.32 | 2.67 | 22.16 |
| A | 63 | GLY | A | 95 | ALA | -62.52 | 6.22 | 23.05 |
| A | 63 | GLY | A | 139 | GLY | -108.95 | 4.61 | 22.18 |
| A | 64 | ASN | A | 68 | ALA | 102.6 | 3.11 | 30.74 |
| A | 65 | GLY | A | 133 | HIS | -109.19 | 4.7 | 26.37 |
| A | 65 | GLY | A | 134 | SER | 71.55 | 4.11 | 28.94 |
| A | 68 | ALA | A | 72 | THR | 98.85 | 3.95 | 31.64 |
| A | 70 | PRO | A | 89 | ALA | -97.59 | 0.87 | 25.53 |
| A | 76 | ILE | A | 221 | ALA | 95.18 | 2.2 | 23.3 |
| A | 90 | ALA | A | 100 | ASP | 95.99 | 1.04 | 22.98 |
| A | 95 | ALA | A | 139 | GLY | 88.84 | 3.58 | 22.93 |
| A | 98 | GLY | A | 143 | ALA | -75.66 | 5.59 | 23.39 |
| A | 111 | GLN | A | 119 | TYR | 92.44 | 3.86 | 27.31 |
| A | 112 | ASN | A | 123 | LEU | 122.64 | 2.68 | 27.89 |
| A | 131 | ALA | A | 153 | ALA | 123.66 | 5.02 | 18.94 |
| A | 144 | GLY | A | 152 | THR | -114.55 | 4.94 | 22.12 |
| A | 151 | VAL | A | 235 | HIS | 125.32 | 6.18 | 23.75 |
| A | 155 | PHE | A | 217 | PRO | 117.23 | 4.2 | 23.72 |
| A | 169 | GLN | A | 199 | ARG | 97.71 | 1.79 | 31.9 |
| A | 178 | LEU | A | 206 | TRP | 124.24 | 6.3 | 21.58 |
| A | 180 | THR | A | 189 | PRO | -115.23 | 4.39 | 28.99 |
| A | 186 | ILE | A | 214 | HIS | -91.88 | 2.57 | 28.27 |
| A | 203 | PRO | A | 265 | GLY | -77.42 | 3.35 | 38.17 |
| A | 208 | GLU | A | 260 | ASP | 121.42 | 8.18 | 26.52 |
| A | 209 | LEU | A | 212 | ALA | 103.99 | 1.15 | 28.1 |
| A | 212 | ALA | A | 216 | GLU | 122.04 | 4.35 | 28.96 |
| A | 230 | ALA | A | 242 | ALA | 104.56 | 0.8 | 23.22 |
| A | 233 | ARG | A | 239 | ASP | -93.12 | 1.84 | 24.57 |
| A | 263 | ARG | A | 266 | ILE | 117.48 | 7.92 | 39.71 |
We screened based on the suggested threshold values of paremeters (Chi3∈[80°,100°] + Energy<5 + ΣB>30) and gain two disulfide bonds, 58ALA-62THR and 159GLN-189ARG. Then we used AlphFold 3 [5] for structural prediction to further verify the formability of two disulfide bonds.
Two disulfide bonds appeared simultaneously in one structure prediction result, implying the feasibility of its disulfide bond design.
Multiple mutations and disulfide bond stability
Consequently, we throwed combinations of multiple single point mutations in the well-established pipline, and got the digital folding energy data of multipoint mutations:
Multi-Site Combination Mutation and dG Data
| Mutation | dG (kcal/mol) |
|---|---|
| q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C | |
| q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234k_d248k | -24.28 |
| q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234s_d248k | -13.43 |
| q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234k_d248m | -21.31 |
| q146f_q159w_v208l_d213r_d242r_t142l__A58C_T62C_Q159C_R189C__d234s_d248m | -17.15 |
References
1. Chen, J., Jia, Y., Sun, Y. et al. Global marine microbial diversity and its potential in bioprospecting. Nature 633, 371–379 (2024). https://doi.org/10.1038/s41586-024-07891-2 ↑ back
2. Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):e2016239118. doi: 10.1073/pnas.2016239118 . PMID: 33876751; PMCID: PMC8053943. ↑ back
3. J. Dauparas et al. ,Robust deep learning–based protein sequence design using ProteinMPNN.Science378,49-56(2022). DOI:10.1126/science.add2187 ↑ back
4. H. Dieckhaus,M. Brocidiacono,N.Z. Randolph, & B. Kuhlman, Transfer learning to leverage larger datasets for improved prediction of protein stability changes, Proc. Natl. Acad. Sci. U.S.A. 121 (6) e2314853121, https://doi.org/10.1073/pnas.2314853121 (2024). ↑ back
5. Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w ↑ back
6. Buß O, Rudat J, Ochsenreither K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches? Comput Struct Biotechnol J. 2018 Feb 3;16:25-33. doi: 10.1016/j.csbj.2018.01.002 . PMID: 30275935; PMCID: PMC6158775. ↑ back
7. Craig, D.B., Dombkowski, A.A. Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins. BMC Bioinformatics 14, 346 (2013). https://doi.org/10.1186/1471-2105-14-346 ↑ back