Society
This project establishes a three-track parallel computational design
strategy, combining point mutation
optimization and inverse folding based on natural templates, as well
as de novo design based on physical
rules, to systematically explore the design space of antifreeze
proteins. We aim not only to optimize the
performance of existing AFPs, but also to create new antifreeze
proteins with novel structures and enhanced
functions.
Design Architecture
Our design platform includes three complementary design dimensions,
each subject to rigorous validation
criteria.
Track 1: Point Mutation Optimization Based
on Natural Templates
We selected four natural antifreeze proteins (PDB codes of 3WP9, 3ULT,
4NU2 and 6A8K) with significant
antifreeze effect as design templates, and designed mutation sites
through the ESM-1v model. Highly conserved
key residues during evolution (used to maintain structural core
stability) and highly variable surface
residues (as potential targets for functional optimization) were
identified. On this basis, we prioritize
directed mutations in the variable region to explore the possibility
of functional enhancement while
maintaining the overall fold of the protein.
Track 2: Inverse Folding Optimization Based
on Natural Templates
Based on the multi-objective optimization framework, the design
strategy organically combines deep learning
and molecular simulation technology to achieve systematic improvement
of antifreeze protein performance. The
entire process follows DBTL's engineering principles to ensure a high
success rate for the final candidate
molecule.
The first stage is the initial design of multiple templates
We continued to modify the above four proteins, and through the
multi-parameter scanning strategy of
ProteinMPNN, we generated a series of candidate sequences for each
template, realizing the systematic
exploration of the AFP sequence space.
The second stage is multi-dimensional performance evaluation
High-throughput screening of candidate sequences was achieved through
quadruple parallel calculations of
AlphaFold2 structure verification, FoldX stability prediction,
protein-sol solubility analysis and predictor
function evaluation.
The third stage is dynamic performance verification
Molecular dynamics simulation is used to evaluate the initial dynamics
of excellent candidate molecules to
provide decisions for the final output. See our Molecular Dynamics
Simulation page for details.
The fourth stage is the iterative optimization cycle
Based on the simulation results, we establish an adaptive optimization
loop: candidate molecules with better
performance than natural antifreeze proteins or better performance are
directly exported for wet experimental
verification. The variants that do not meet the standard enter the
next round of design as parent sequences,
and achieve directed evolution of performance through a gradual
cooling strategy.
Figure 1. Inverse folding optimization
process
Track 3: De Novo Design Based on Inverse
Folding Optimization
We have developed three innovative de novo design strategies to tailor
the structure to the functional needs
of antifreeze proteins.
Strategy 1 Spiral periodic function array
Considering that the structure of type I antifreeze protein is mainly
composed of α helix and rich in alanine
[1], and alanine is the strongest helix former [2], we use alanine as
the sequence to build a standard α
helix, and use 11 residue intervals to implant threonine [3][4]。 This
interval corresponds precisely to the
3-turn period of the α spiral, ensuring that all antifreeze sites are
oriented on the same side of the helix
to form a continuous ice-binding interface, which is visualized using
PyMOL.
Figure 2. Using PyMOL to ensure that the
antifreeze sites are oriented on the same
side of the helix
(the figure shows Cα)
Strategy 2 Ice lattice matching function array
The literature shows that the strongest ice binding can be achieved
when the spacing of the antifreeze sites
is 1.5 times that of the ice lattice (4.5 Å), or 6.75 Å [5].
Therefore, we also use the alanine spiral
skeleton as the basis and implant threonine at 3-residue intervals.
The design makes the spacing of the
antifreeze sites approximately 6.75Å to maximize the complementarity
of the ice crystal interface and optimize
the adsorption energy.
Figure 3.1 Regulation of antifreeze site
spacing using PyMOL (the figure shows Cα)
Figure 3.2 Adjusted final spacing (the
figure shows Cα)
Figure 4. Simplified process of
designing strategies 1 and 2 from scratch
Strategy 3 AI generates a new skeleton
RFdiffusion2 was used to generate a new skeleton from scratch to
explore the design space beyond natural
proteins [6], and subsequent Denovo5 and Denovo6 were generated
using this method.
Unified optimization process.
At this point, all the skeleton structures generated by the de novo
design strategy have subsequently entered
the unified inverse folding optimization program. Sequence design
was performed by ProteinMPNN, and key
functional residues were fixed during optimization while the
remaining sites were optimized for stability and
solubility.
Results and Analysis
Track 1: We selected the loci with a score threshold greater
than 1 to obtain mutants with no more than
3 loci for four protein mimicking mutations, and successfully
constructed them in different chassis strains of
E. coli and yeast.
Figure 5. Simulation results of ESM-1v
mutation
Track 2 & 3: Through the inverse folding optimization
process, we generated no less than 300 candidate
sequences for each template, and successfully obtained and output 10
high-quality antifreeze protein design
variants that were validated in multiple dimensions, which were then
verified by wet experiments. Among them,
3WP9-IF, 3ULT-IF, 4NU2-IF and 6A8K-IF were obtained by the inverse
folding optimization pipeline, Denovo1 and
Denovo2 were obtained by spiral periodic functional arrays, Denovo3
and Denovo4 were obtained by ice lattice
matching functional arrays, and Denovo5 and Denovo6 were generated
by AI (the sequences of the 10 output
proteins and some candidate sequences will be placed in the
supplementary file).
Blastp sequence alignment
We first sequence aligned the protein sequences obtained with
ProteinMPNN defolding, which was done using
Blastp.
3WP9-IF
                                                                              
3ULT-IF
4NU2-IF
                                                                              
6A8K-IF
Figure 6. Sequence comparison
Structural Reliability Analysis
(AlphaFold2)
The sequence consistency between the four variants and the original
protein was less than 55%, and the
sequence similarity was less than 70%, indicating that the sequence
of the variant obtained by defolding was
large and the original sequence was far from the original sequence,
and it had strong novelty.
Structural reliability analysis of AlphaFold2
We conducted a rigorous folding reliability evaluation of 10 design
variants using a comprehensive structural
biology analysis method.
3WP9-IF
                                                                              
3ULT-IF
4NU2-IF
                                                                              
6A8K-IF
Denovo1
                                                                              
Denovo2
Denovo3
                                                                              
Denovo4
Denovo5
                                                                              
Denovo6
Figure 7.1 pLDDT
Most of the regional confidence levels of the inverse fold variant and
the de novo design variant are above
90,
and the local low confidence area is mostly a flexible region, and
each variant shows high confidence as a
whole. The predicted structure is very similar to the actual
structure, which lays a good foundation for
stability testing and molecular dynamics simulation.
3WP9-IF
                                                                              
3ULT-IF
4NU2-IF
                                                                              
6A8K-IF
Denovo1
                                                                              
Denovo2
Denovo3
                                                                              
Denovo4
Denovo5
                                                                              
Denovo6
Figure 7.2 PAE
We analyzed the prediction alignment error
plots provided by AlphaFold Server and
found that the PAE plots of
variants other than 3ULT-IF and Denovo5 roughly showed diagonally
distributed dark green squares with clear
outlines, indicating that each variant had one or more stable domains
and the relative positions between
internal residues were very stable. The PAE plots of 3ULT-IF and
Denovo5 showed some light green plates,
indicating that there were some disordered regions in their sequences,
which could not form stable folds, but
we
could still further verify their frost resistance through wet
experiments.
Comprehensive Performance Analysis
We quantitatively evaluated and compared key performance indicators
for all variants.
Table 1 Key calculation evaluation
indicators of output proteins
Stability improvement: ΔΔG of all variants is less than 0, and the
inverse folding strategy has obvious
advantages in stability optimization.
Significant functional activity: all variants showed high frost
resistance probability, and the predicted
frost resistance activity of rational design variants was greater than
0.8, which verified the effectiveness
of physical rule design.
Good solubility: All variants have high solubility, which provides a
good foundation for frost resistance.
Gromacs molecular dynamics simulation.
This section will not be demonstrated first, and a separate module
will be opened for detailed explanation.
Comparison of Design Strategy
Effectiveness
Both the point mutation optimization and the inverse folding
optimization strategies show the most reliable
stability improvement effect. This result is in line with
expectations, as both strategies are based on an
optimized framework for native proteins, maximizing the preservation
of evolutionarily tested stable
structural cores. The four inverse fold variants also showed high
solubility and frost resistance probability,
and the protein crude enzyme solution of the mutants of the two
strategies was verified by wet experiments,
proving the robust optimization ability of these proteins under the
premise of maintaining function.
The rational design strategy performed well in terms of frost
resistance, especially the variant Denovo3 based
on the ice crystal matching principle obtained a predicted frost
resistance probability of 0.960. This result
verifies the effectiveness of the functional locus arrangement design
based on physical rules, and its
excellent performance in core functional indicators makes it the first
choice for function-oriented
applications.
The AI-generated design strategy is unique in terms of structural
novelty, and the resulting variants have an
unnatural folding pattern, providing a new structural template for the
antifreeze protein family, which has
important basic research value. Among them, the Denovo5 and Denovo6
variants have good indicators, which are
worthy of further in-depth study.
Conclusion
The three-track design strategy proved highly successful. Point
mutation and inverse folding optimization are
reliable for stable variants, rational de novo design shows great
functional potential, and AI generation
offers structural innovation. All output molecules have been
computationally verified, providing a solid
foundation for wet experiments.