Overview
1. Prediction of protein 3D structures
2. Response Surface Method
References
Computational modeling can advance our project's experimental phase. On one hand, we aim to predict the three-dimensional structures of proteins from amino acid sequences, providing theoretical data to support the study of enzymatic activity. On the other hand, we employ Response Surface Methodology (RSM) to construct multidimensional models for predicting the yield and conversion rate of erythritol fermentation. Furthermore, the models help identify the optimal glucose-to-glycerol ratio that maximizes erythritol production efficiency, thereby providing critical support for the optimization of our experimental design.
1. Homology modeling
Model Goal:
The homology method was used to predict the three-dimensional structure of the proteins,such as phosphoketolase(PK), erythritol-4-phosphate dehydrogenase(EPDH), phosphatase(PTase), glycerol kinase(GUT1), encoding glycerol-3-phosphate dehydrogenase(GUT2) and encoding triosephosphate isomerase(TPI1). Correct protein folding and expression not only significantly reduce experimental costs and time but also provide crucial data support.
The principle of homology method:
The three-dimensional structure of a protein is formed by the folding of peptide chains through the hydrophobic interaction between side chains, hydrogen bonds, ionic bonds, and disulfide bonds(Yan, etc; 1999). This has a significant impact on enzyme activity. There must be specific spatial arrangements in the middle of enzyme activity, especially the hydrophobic core needs to be wrapped with polar residues to prevent water molecules from coming into contact. Disulfide bonds can also make the structure of proteins rigid(Xu, etc; 2023). Predicting the three-dimensional structure of proteins has a great foundation for understanding protein functions and biosynthesis mechanisms. >
It is a typical approach in protein structure prediction as homologous structure prediction. First, the target protein sequence is aligned with the template sequence to assess their sequence identity(Kiefer, etc;2009). If the sequence homology over 40%, the homologous method can be used to infer their three-dimensional structure. Homologous proteins have similar spatial structures and the same or similar functions. If the structures of certain homologous proteins are known,the structures of other known but structurally unknown homologous proteins can be predicted(Bienert,Bertoni, etc;2017).
Commonly used databases include Swiss-model and interproscan.The GMQE (Global Model Quality Estimation), QMEAN, and Ramachandran plot serve as reliable indicators for assessing model quality. The GMQE score reflect the reliability of protein structure. The GMQE value is between 0 and 1. The closer it is to 1, the better the modeling quality . The QMEAN score near the 0 value indicates a good consistency between the model structure and the experimental structure of similar structure. A score of -4.0 or below indicates a lower quality of the model. The QMEAN value range is -4 to 0. The closer it is to 0, the better the matching degree(Kiefer, etc;2009). The Ramachandran plot (also known as the Ramachandran plot) combines bar charts and line graphs to identify the "critical few" factors. It is a visualization of the dihedral angles ψ and φ of the main chain amino acid residues. It can also reflect whether the conformation of the protein is reasonable.
Material:
software: SWISS-MODEL
Gene Name:PK(phosphoketolase)
Origin: Clostridium acetobutylicum
Amino acid sequence:MQSIIGKHKDEGKITPEYLKKIDAYWRAANFISVGQLYLLDNPLLREPLKPEHLKRKVVGHWGTIPGQNFIYAHLNRVIKKYDLDMIYVSGPGHGGQVMVSNSYLDGTYSEVYPNVSRDLNGLKKLCKQFSFPGGISSHMAPETPGSINEGGELGYSLAHSFGAVFDNPDLITACVVGDGEAETGPLATSWQANKFLNPVTDGAVLPILHLNGYKISNPTVLSRIPKDELEKFFEGNGWKPYFVEGEDPEAMHKLMAETLDIVTEEILNIQKNARENNDCSRPKWPMIVLRTPKGWTGPKFVDGVPNEGSFRAHQVPLAVDRYHTENLDQLEEWLKSYKPEELFDENYRLIPELEELTPKGNKRMAANLHANGGLLLRELRTPDFRDYAVDVPTPGSTVKQDMIELGKYVRDVVKLNEDTRNFRIFGPDETMSNRLWAVFEGTKRQWLSEIKEPNDEFLSNDGRIVDSMLSEHLCEGWLEGYLLTGRHGFFASYEAFLRIVDSMITQHGKWLKVTSQLPWRKDIASLNLIATSNVWQQDHNGYTHQDPGLLGHIVDKKPEIVRAYLPADANTLLAVFDKCLHTKHKINLLVTSKHPRQQWLTMDQAVKHVEQGISIWDWASNDKGQEPDVVIASCGDTPTLEALAAVTILHEHLPELKVRFVNVVDMMKLLPENEHPHGLSDKDYNALFTTDKPVIFAFHGFAHLINQLTYHRENRNLHVHGYMEEGTITTPFDMRVQNKLDRFNLVKDVVENLPQLGNRGAHLVQLMNDKLVEHNQYIREVGEDLPEITNWQWHV
Gene Name: EPDH(erythritol-4-phosphate dehydrogenas)
Origin: Brucellamelitensis
Amino acid sequence:MAEPETCDLFVIGGGINGAGVARDAAGRGLKVVLAEKDDLAQGTSSRSGKLVHGGLRYLEYYEFRLVREALIEREVLLNAAPHIIWPMRFVLPHSPQDRPAWLVRLGLFLYDHLGGRKKLPGTRTLDLKRDPEGTPILDQYTKGFEYSDCWVDDARLVALNAVGAAEKGATILTRTPVVSARRENGGWIVETRNSDTGETHTFRARCIVNCAGPWVTDVIHNVAASTSSRNVRLVKGSHIIVPKFWSGANAYLVQNHDKRVIFINPYEGDKALIGTTDIAYEGRAEDVAADEKEIDYLITAVNRYFKEKLRREDVLHSFSGVRPLFDDGKGNPSAVTRDYVFDLDETGGAPLLNVFGGKITTFRELAERGVHRLKHIFPQMGGDWTHDAPLPGGEIANADYETFANTLRDTYPWMPRTLVHHYGRLYGARTKDVVAGAQNLEGLGRHFGGDFHEAEVRYLVAREWAKTAEDILYRRTKHYLHLTEAERAAFVEWFDNANLVA
Gene Name: PTase(phosphatase)
Origin: Oenococcus oeni
Amino acid sequence:MELITSVEFGRMINAAAQILTKNAQHINKLNVFPVPDGDTGTNMSLTMQSGAQYERDSTETSIAALSAAMSKGLLMGARGNSGVILSQIMRGFTKFVANFDTLDAKQFANALKAGAESAYKSVMKPTEGTILTVIRESAAAAGDAADQSDDLVDVAKATWDASKEALAKTPDLLPVLKEVGVVDSGGQGLVFVFQSWYEVLSGKTTQEDLSTPPDMAQFDEKTDEFDAQVSLDPKDIKYGYCTTILFETGKGSTYDREWNYDKFYSYLSKKGDSLLVIADDGLVKTHVHTEDPGAILTEATHYGSIKWVKIDNMRDQQQAVIDRVAKEQASQPKKPIETAVITVASGHGVSELFKSMGVTDVITGGQTMNPSTKDLLNAITSSKAKNAIIIPNNANIFMAASQAADMSKIPVEIVKSKTIQQGLTAMLGFNPDADVKENASEMTAQLSTVKSAEVTKAVRDTSIDGKSIKRGEYIGIVDGKIQANGRRLRDVAINSVKAMLDDDSEIVTIIYGSQSNQKESEQLTKAISKLDNNLETEIHEGDQPLYPFLISVE
Gene Name:GUT1(glycerol kinas)
Origin: Saccharomyces cerevisiae (strain ATCC 204508 / S288c)
Amino acid sequence:MFPSLFRLVVFSKRYIFRSSQRLYTSLKQEQSRMSKIMEDLRSDYVPLIASIDVGTTSSRCILFNRWGQDVSKHQIEYSTSASKGKIGVSGLRRPSTAPARETPNAGDIKTSGKPIFSAEGYAIQETKFLKIEELDLDFHNEPTLKFPKPGWVECHPQKLLVNVVQCLASSLLSLQTINS
ERVANGLPPYKVICMGIANMRETTILWSRRTGKPIVNYGIVWNDTRTIKIVRDKWQNTSVDRQLQLRQKTGLPLLSTYFSCSKLRWFLDNEPLCTKAYEENDLMFGTVDTWLIYQLTKQKAFVSDVTNASRTGFMNLSTLKYDNELLEFWGIDKNLIHMPEIVSSSQYYGDFGIPDWIMEKLHDSPKTVLRDLVKRNLPIQGCLGDQSASMVGQLAYKPGAAKCTYGTGCFLLYNTGTKKLISQHGALTTLAFWFPHLQEYGGQKPELSKPHFALEGSVAVAGAVVQWLRDNLRLIDKSEDVGPIASTVPDSGGVVFVPAFSGLFAPYWDPDARATIMGMSQFTTASHIARAAVEGVCFQARAILKAMSSDAFGEGSKDRDFLEEISDVTYEKSPLSVLAVDGGMSRSNEVMQIQADILGPCVKVRRSPTAECTALGAAIAANMAFKDVNERPLWKDLHDVKKWVFYNGMEKNEQISPEAHPNLKIFRSESDDAERRKHWKYWEVAVERSKGWLKDIEGEHEQVLENFQ
Gene Name:GUT2( glycerol-3-phosphate dehydrogenase)
Origin: Saccharomyces cerevisiae(strain ATCC 204508 / S288c)
Amino acid sequence:MFSVTRRRAAGAAAAMATATGTLYWMTSQGDRPLVHNDPSYMVQFPTAAPPQVSRRDLLDRLAKTHQFDVLIIGGGATGTGCALDAATRGLNVALVEKGDFASGTSSKSTKMIHGGVRYLEKAFWEFSKAQLDLVIEALNERKHLINTAPHLCTVLPILIPIYSTWQVPYIYMGCKFYDFFAGSQNLKKSYLLSKSATVEKAPMLTTDNLKASLVYHDGSFNDSRLNATLAITAVENGATVLNYVEVQKLIKDPTSGKVIGAEARDVETNELVRINAKCVVNATGPYSDAILQMDRNPSGLPDSPLNDNSKIKSTFNQIAVMDPKMVIPSIGVHIVLPSFYCPKDMGLLDVRTSDGRVMFFLPWQGKVLAGTTDIPLKQVPENPMPTEADIQDILKELQHYIEFPVKREDVLSAWAGVRPLVRDPRTIPADGKKGSATQGVVRSHFLFTSDNGLITIAGGKWTTYRQMAEETVDKVVEVGGFHNLKPCHTRDIKLAGAEEWTQNYVALLAQNYHLSSKMSNYLVQNYGTRSSIICEFFKESMENKLPLSLADKENNVIYSSEENNLVNFDTFRYPFTIGELKYSMQYEYCRTPLDFLLRRTRFAFLDAKEALNAVHATVKVMGDEFNWSEKKRQWELEKTVNFIKTFGV
Gene Name:TPL1(encoding triosephosphate isomerase)
Origin: Saccharomyces cerevisiae(strain ATCC 204508/S288c)
Amino acid sequence:MARTFFVGGNFKLNGSKQSIKEIVERLNTASIPENVEVVICPPATYLDYSVSLVKKPQVTVGAQNAYLKASGAFTGENSVDQIKDVGAKWVILGHSERRSYFHEDDKFIADKTKFALGQGVGVILCIGETLEEKKAGKTLDVVERQLNAVLEEVKDWTNVVVAYEPVWAIGTGLAATPEDAQDIHASIRKFLASKLGDKAASELRILYGGSANGSNAVTFKDKADVDGFLVGGASLKPEFVDIINSRN
Process:
1. Open SwissModel website (https://swissmodel.expasy.org/), point "Start Modeling", to research the protein sequence (FASTA format) up.
2. Find templates: The system will automatically search for known protein structures with high similarity to the target sequence as templates. If you don't specify a template, the system will automatically select the one with the highest similarity
3. Select Templates: You can choose one or more templates for modeling. Generally, the one that is most similar to the target sequence is chosen, and the final model is more accurate.
4. Model building: Select the template, and the system will automatically start the same-origin modeling.
5. Quality evaluation: After the model is built, it can be downloaded and scored using indicators such as GMQE and QMEAN.
6. Adjust the model: If the model quality is not satisfactory, manually adjust and optimize it using "Project mode", and the result will be more reliable
Result
According to Figure 1, the GMQE value of PK protein is 0.86, which is relatively close to 1. The QMEAN value is within the range of -4 to 0, and the seq identity (57.11%) is higher than 40%. The amino acids of the PK protein are all concentrated in the light green area and aggregated at the peak, indicating that the three-dimensional structure are highly reliable. As shown in Figure 2, this is the three-dimensional structure of PK protein.
Figure 1. The evaluation result of PK
Figure 2. Three-dimensional structure of PK
In Figure 3, the GMQE value of the three-dimensional structure of the EPDH protein is 0.84. The QMEAN value is within the range of -4 to 0 , and the seq identity (50.10%).The amino acids of the PK protein are all concentrated in the light green area and aggregated at the peak, indicating that the three-dimensional structure EPDH protein are highly reliable. The Figure 4 is l three-dimensional structure of the EPDH protein.
Figure 3 Evaluation results of EPDH A/Amino acid distribution results of EPDH
Figure 4. Three-dimensional structure of EPDH
The GMQE value is 0.81, which is relatively close to 1. The QMEAN value is within the range of -4 to 0, and the seq identity (47.64%) is higher than 40%.In Figure 5, the amino acids of the PTase protein are all concentrated in the light green area and aggregated at the peak. It indicating that the three-dimensional structure of the PK protein are highly reliable. As shown in Figure 6, this is the three-dimensional structure of the PTase protein.
Figure 5. Evaluation results of PTase
Figure 6. Three-dimensional structure of PTase
In Figure 7, Figure 8 and Figure 9 , the seq identity of the three-dimensional structure of the GUT1,GUT1and TPL1 protein is 100%, indicating that someone has successfully constructed the three-dimensional structure diagram of the GUT1,GUT1and TPL1 protein before.
Figure 7. Evaluation results of GUT1
Figure 8. Evaluation results of GUT2
Figure 9. Evaluation results of TPL1
As shown in Figure 10 A,B,C, this is the final three-dimensional structure of the GUT1,GUT1and TPL1 protein .
Figure 10. Figure A is the three-dimensional structural of the protein GUT1;Figure B is the three-dimensional structural of the protein GUT2;Figure C is the three-dimensional structural of the protein TPL1.
Analysis, Discussion and Limitations
If the sequence consistency between the target protein and the template protein is relatively high than 60%, the SwissModel can generate high-quality three-dimensional models, surpassing the prediction of AlphaFold2.The Swiss Model can only predict homologous proteins with known structures. For novel proteins without known homologues, the accuracy of prediction may be relatively low.The Swiss Model cannot automatically replenish missing amino acids, which may affect the completeness and accuracy of the model.
To further optimize the three-dimensional structures of phosphoketolase (PK), erythritol-4-phosphate dehydrogenase (EPDH), and phosphatase (PTase), we selected AlphaFold2 to predict the protein structures. By comparing the predicted structures with existing models,we identified and selected the superior three-dimensional model.
1.2 The Alphafold2 method
Principle:
In protein structure prediction and sequence analysis, "aligned residue" refers to the amino acid residue in the position correspondence determined through multiple sequence alignment (MSA). In the confidence heatmap, the X-axis represents 'Scored Residue'(score residue). The Y-axis represents 'Aligned Residue'(alignment residue), and the entire graph is a green heatmap indicating the relationship between the scoring residues at different positions and the alignment residues. A darker green indicates a smaller prediction error at that position, while a lighter green indicates a larger prediction error. In the lower half of the graph, there is a color bar indicating the value range of the predicted position error, which is approximately from 0 to 30. This reflects that the darker the color, the more credible it is(Liu etc;2020).
ipTM (interface prediction template) and pTM (predicted template) are two key indicators in the AlphaFold3 prediction model used to evaluate the possibility of protein-protein interactions. pTM is used to measure the similarity between the predicted structure and the real structure. The higher the score, the higher the reliability of the prediction. When the pTM+ipTM score is higher than 0.5, it is considered that the proteins have the possibility of interaction. A value higher than 0.75 is considered to have a high possibility of interaction(Liu etc;2020).
Figure 11.The protein structure of EPDH.
In Figure 11, the protein structure diagram of EPDH is overall dark blue. And the color of the confidence heatmap is mostly dark green, with very few light-colored areas.The value of pTM+ipTM is 0.55, which is lower than 0.75.It indicates that the structure of EPDH is incredible.
Figure 12.The protein structure of PTase.
In Figure 12, the protein structure diagram of PTase is basically dark blue and light blue, indicating a relatively high level of confidence. The color of the confidence heatmap shows that the dark and light areas each account for half, indicating that the protein prediction error is average. The value of pTM+ipTM is 0.55, which is lower than 0.75 but higher than 0.5, and the possibility of interaction is moderate.
Figure 13.The predicted protein of PK
In Figure 13, the overall protein structure diagram of PK is dark blue, indicating a relatively high level of confidence. The color of the confidence heatmap is mostly dark green, with very few light-colored areas, indicating that the protein prediction error is relatively small. The value of pTM+ipTM is 0.96, which is higher than 0.75, and the possibility of interaction is very high.
Experimental data acquisition:
We prepared five different fermentation media with varying substrate compositions and ratios (fermentation conditions: 37°C, pH 7-8), as follows:
Glucose:Glycerol = 10:10 (g/L),
Glucose:Glycerol = 20:10 (g/L),
Glucose:Glycerol = 17:10 (g/L),
Glucose:Glycerol = 10:20 (g/L),
Glucose:Glycerol = 20:0 (g/L),
Glucose:Glycerol = 0:20 (g/L).
Samples were taken at different time points (5h, 16h, 24h, 48h) to measure the concentrations of glucose, glycerol, and erythritol.The concentrations of products and substrates were determined by high-performance liquid chromatography (HPLC)Table1 .
Table 1: The original data structures of glucoglycerol and erythritol determined by HPLC
|
Glucose:Glycerol=10:10 |
|||||||||
|
Time |
Glucose |
Glycerol |
Erythritol |
||||||
|
5h |
8.96 |
9.19 |
9.2 |
9.92 |
10.61 |
9.98 |
0 |
0 |
0 |
|
16h |
7.44 |
7.71 |
7.81 |
9.78 |
9.97 |
9.96 |
1.73 |
1.61 |
1.65 |
|
24h |
4.64 |
4.03 |
4.13 |
9.57 |
10 |
10.01 |
1.85 |
2.5 |
2.6 |
|
48h |
0 |
0.11 |
0.12 |
3.86 |
3.21 |
3.2 |
2.21 |
2.72 |
2.91 |
|
Glucose:Glycerol=20:10 |
|||||||||
|
Time |
Glucose |
Glycerol |
Erythritol |
||||||
|
5h |
20.15 |
20.47 |
20.01 |
8.61 |
8.5 |
8.9 |
0 |
0 |
0 |
|
16h |
18.99 |
19.31 |
17.43 |
7.55 |
7.74 |
7.94 |
1.7 |
1.6 |
1.8 |
|
24h |
14.56 |
14.31 |
14.28 |
8.46 |
8.94 |
8.74 |
2.23 |
2.44 |
2.74 |
|
48h |
1.14 |
1.59 |
1.7 |
3.49 |
3.92 |
3.52 |
2.69 |
2.78 |
2.98 |
|
Glucose:Glycerol=10:20 |
|||||||||
|
Time |
Glucose |
Glycerol |
Erythritol |
||||||
|
5h |
8.65 |
8.91 |
9.2 |
17.86 |
18.14 |
18.7 |
0 |
0 |
0 |
|
16h |
7.26 |
7.28 |
7.58 |
15.74 |
18.14 |
16.99 |
1.63 |
1.62 |
1.8 |
|
24h |
4.76 |
4.02 |
5 |
17.71 |
18.31 |
18.39 |
2.34 |
2.5 |
2.9 |
|
48h |
0 |
0 |
0 |
7.9 |
8.77 |
8.01 |
2.96 |
2.96 |
3.4 |
|
Glucose:Glycerol=20:0 |
|||||||||
|
Time |
Glucose |
Glycerol |
Erythritol |
||||||
|
5h |
19.41 |
19.87 |
19.78 |
0 |
0 |
0 |
0 |
0 |
0 |
|
16h |
16.12 |
16.83 |
15.43 |
0 |
0 |
0 |
1.66 |
1.77 |
1.89 |
|
24h |
12.46 |
12.14 |
11.36 |
0 |
0 |
0 |
1.88 |
2.29 |
2.58 |
|
48h |
6.5 |
7.15 |
6.01 |
0 |
0 |
0 |
2.23 |
2.31 |
2.45 |
|
Glucose:Glycerol=0;20 |
|||||||||
|
Time |
Glucose |
Glycerol |
Erythritol |
||||||
|
5h |
0 |
0 |
0 |
19.05 |
19.49 |
19.89 |
0 |
0 |
0 |
|
16h |
0 |
0 |
0 |
16.9 |
16.4 |
16.94 |
1.71 |
1.66 |
1.45 |
|
24h |
0 |
0 |
0 |
10.71 |
10.44 |
10.89 |
1.82 |
2.6 |
2.05 |
|
48h |
0 |
0 |
0 |
6.56 |
7.18 |
8.1 |
2.04 |
2.94 |
2.7 |
software:
Matlab
Process:
Thanks for the code, Professor Li
Code:
clc :
clear :
close all :
[a,ax,ay] = xlsread('Exel 2'); :
x0= 1:5;
y0= [5 16 24 48];
num = [];
for ii = 1:5
for jj = 1:4
num = [num ; x0(ii) y0(jj) mean(a(jj,4*(ii-1)+1:4*ii))];
end
end
x = reshape(num(:,1),4,5);
y = reshape(num(:,2),4,5);
z = reshape(num(:,3),4,5);
n=99;
x1 = 1:4/n:5;
y1 = 5:43/n:48;
[X,Y] = meshgrid(x1,y1);
Z = interp2(x,y,z,X,Y,'spline');
figure
surfc(X,Y,Z)
shading interp
colormap jet
[maxx,maxy] = find(Z==max(max(Z)));
[minx,miny] = find(Z==min(min(Z)));
max_z = Z(maxx,maxy)
max_x = X(maxx,maxy)
max_y =Y(maxx,maxy)
min_z = Z(minx(1),miny(1))
min_x = X(minx(1),miny(1))
min_y = Y(minx(1),miny(1))
hold on
plot3(X(maxx,maxy),Y(maxx,maxy),max_z,'bo','MarkerFaceColor','b')
hold on
plot3(X(minx(1),miny(1)),Y(minx(1),miny(1)),min_z,'go','MarkerFaceColor','g')
hold on
text(X(maxx,maxy)+0.2,Y(maxx,maxy),max_z+0.2,num2str(max_z))
text(X(minx(1),miny(1))+0.2,Y(minx(1),miny(1)),min_z+0.2,num2str(min_z),'color','g')
xlabel('Glucose:Glycerol')
ylabel('Time (h)')
zlabel('Erythritol (g / L)')
set(gca,'XTick',1:5)
set(gca,'XTickLabel',{'0:20','10:20','10:10','20:10','20:0'})
set(gca,'fontname','Times New Roman')
set(gca,'ZTick',0:0.5:3)
set(gca,'ZTickLabel',{'0.00','0.50','1.00','1.50','2.00','2.50','3.00'})
Result
The Y-axis of the chart is marked as "Time (h)", with a range from 0 to 40 hours. The X-axis is marked as "Glucose:Glycerol", showing different proportion values such as 0:20, 10:10, 20:0, etc. The Z-axis is marked as "Erythritol (g/L)", with values ranging from 0 to 3.4036. The different erythritol yields are divided into several colors. This figure 14 shows that the erythritol yield is the highest at Mix(X,Y,Z)=1.7676; 39.7474; 3.403643.At the highest value, the ratio of glucose concentration to glycerol concentration was 17:10. After the initial measurement was completed, the samples were subjected to a secondary fermentation process, and then the highest values were measured using high-performance liquid chromatography (HPLC) technology.
Figure 14 . 3D modeling of erythritol production. Maximum value: Mix(X,Y,Z)=1.7676; 39.7474; 3.403643
The table 2 shows that the experimental group with the highest erythritol yield was Glucose:Glycerol = 17:10 g/L, followed by Glucose:Glycerol = 10:20 g/L. The group with the lowest erythritol yield was Glucose:Glycerol = 20:0 g/L. In Table 2, the highest erythritol conversion rate was 20.99% in the Glucose:Glycerol = 0:20 g/L, while the lowest was 11.85% in the Glucose:Glycerol = 20:10 g/L .
Table 2. Under different ratios of glucose and glycerol, the concentrations and conversion rates of glucose, glycerol, and erythritol
|
Glucose: Glycerol(g/L) |
ΔGlucose |
Δglycerol |
Erythritol(g/L) |
Percent conversion |
|
20:0 |
13.13 |
0.00 |
2.33 |
17.74% |
|
20:10 |
18.73 |
5.03 |
2.82 |
11.85% |
|
17:10 |
16.37 |
6.63 |
3.42 |
14.86% |
|
10:10 |
9.04 |
6.75 |
2.61 |
16.55% |
|
10:20 |
8.92 |
10.01 |
3.11 |
16.41% |
|
0:20 |
0.00 |
12.20 |
2.56 |
20.99% |
Analysis and Discussion
This graph has some limitations. Firstly, as a 3D modeling graph, there are not enough data points, making it difficult to reflect the overall pattern. The results are easily affected and may have deviations, resulting in a significant drop in prediction accuracy and making it hard to detect true differences during the statistical process, reducing credibility. Additionally, the conversion rate of erythritol is not introduced. The final predicted value can be converted into a conversion rate (%), which will be convenient for observation.
Model Improvement and Discussion
To incorporate the conversion rate into the model, we set the dependent variables as erythritol yield and conversion rate. Figure 15 displays our optimized model, with the optimal point located at (43, 11.7071; 20, 3.1675) and a conversion rate of 0.77678. To ensure the model's reliability, subsequent validation using wet-lab experimental data will be necessary. Additionally, further experimental measurements will be required to refine the model.
Figure 15 . The modeling of erythritol production.
Bienert S, Waterhouse A, De Beer T A P, et al. The SWISS-MODEL Repository—new features and functionality[J]. Nucleic acids research, 2017, 45(D1): D313-D319.
Kiefer F, Arnold K, Kunzli M, et al. The SWISS-MODEL Repository and associated resources[J]. Nucleic acids research 2009, 37(suppl_1): D387-D392.
Liu J ,Zhang X ,Huang K , et al.Grain Protein Function Prediction Based on CNN and Residual Attention Mechanism with AlphaFold2 Structure Data[J].Applied Sciences, 2020,15(4):1890-1890.
Waterhouse A, Bertoni M, Bienert S, et al. SWISS-MODEL: homology modelling of protein structures and complexes[J]. Nucleic acids research, 2018, 46(W1): W296-W303.
Xu Jiaqiang, and Cui Qianqian." Methods for Three-Dimensional Structure Analysis of Proteins." Journal of Henan Normal University (Natural Science Edition)51.4 (2023).
Yan Longfei, Sun Zhirong Protein Molecular Structure [M]. Tsinghua University Press Co., LTD., 1999.
.