Overview

The harsh conditions of extreme cold impede humanity's pursuit of beauty. We have discovered that antifreeze proteins possess functions such as lowering the freezing point and enhancing thermal hysteresis activity, offering novel insights into frost resistance in extreme environments. However, the types of naturally occurring antifreeze proteins are limited, and the precise mechanisms of frost resistance among different proteins remain unclear. We plan to design novel synthetic antifreeze proteins to further enhance their frost resistance activity.

We contend this project aligns strongly with the DBTL principle. We commence with rational protein design using computational models. Following scoring of antifreeze protein candidates via predictive models, we proceed with synthetic biology and wet-lab experiments to construct them in diverse expression systems. Validation employs molecular dynamics simulations and ice crystal formation assays, aiming to identify key active sites in phenotypically superior proteins. This approach holds promise for future iGEM projects in antifreeze protein design.

The scope of in vitro experiments spans de novo protein design to molecular dynamics exploration, incorporating machine learning, neural networks, and large language model technologies. This systematic approach to exploration and innovation delivers multidimensional solutions to the challenge of antifreeze protein design.

Tianjin 2025 Model Framework Diagram
Figure 1. Tianjin 2025 Model Framework Diagram
Society

This project establishes a three-track parallel computational design strategy, combining point mutation optimization and inverse folding based on natural templates, as well as de novo design based on physical rules, to systematically explore the design space of antifreeze proteins. We aim not only to optimize the performance of existing AFPs, but also to create new antifreeze proteins with novel structures and enhanced functions.

Design Architecture

Our design platform includes three complementary design dimensions, each subject to rigorous validation criteria.

Track 1: Point Mutation Optimization Based on Natural Templates

We selected four natural antifreeze proteins (PDB codes of 3WP9, 3ULT, 4NU2 and 6A8K) with significant antifreeze effect as design templates, and designed mutation sites through the ESM-1v model. Highly conserved key residues during evolution (used to maintain structural core stability) and highly variable surface residues (as potential targets for functional optimization) were identified. On this basis, we prioritize directed mutations in the variable region to explore the possibility of functional enhancement while maintaining the overall fold of the protein.

Track 2: Inverse Folding Optimization Based on Natural Templates

Based on the multi-objective optimization framework, the design strategy organically combines deep learning and molecular simulation technology to achieve systematic improvement of antifreeze protein performance. The entire process follows DBTL's engineering principles to ensure a high success rate for the final candidate molecule.

The first stage is the initial design of multiple templates
We continued to modify the above four proteins, and through the multi-parameter scanning strategy of ProteinMPNN, we generated a series of candidate sequences for each template, realizing the systematic exploration of the AFP sequence space.

The second stage is multi-dimensional performance evaluation
High-throughput screening of candidate sequences was achieved through quadruple parallel calculations of AlphaFold2 structure verification, FoldX stability prediction, protein-sol solubility analysis and predictor function evaluation.

The third stage is dynamic performance verification
Molecular dynamics simulation is used to evaluate the initial dynamics of excellent candidate molecules to provide decisions for the final output. See our Molecular Dynamics Simulation page for details.

The fourth stage is the iterative optimization cycle
Based on the simulation results, we establish an adaptive optimization loop: candidate molecules with better performance than natural antifreeze proteins or better performance are directly exported for wet experimental verification. The variants that do not meet the standard enter the next round of design as parent sequences, and achieve directed evolution of performance through a gradual cooling strategy.

Inverse folding optimization process
Figure 1. Inverse folding optimization process
Track 3: De Novo Design Based on Inverse Folding Optimization

We have developed three innovative de novo design strategies to tailor the structure to the functional needs of antifreeze proteins.

Strategy 1 Spiral periodic function array
Considering that the structure of type I antifreeze protein is mainly composed of α helix and rich in alanine [1], and alanine is the strongest helix former [2], we use alanine as the sequence to build a standard α helix, and use 11 residue intervals to implant threonine [3][4]。 This interval corresponds precisely to the 3-turn period of the α spiral, ensuring that all antifreeze sites are oriented on the same side of the helix to form a continuous ice-binding interface, which is visualized using PyMOL.

Spiral array design visualization
Figure 2. Using PyMOL to ensure that the antifreeze sites are oriented on the same side of the helix (the figure shows Cα)

Strategy 2 Ice lattice matching function array
The literature shows that the strongest ice binding can be achieved when the spacing of the antifreeze sites is 1.5 times that of the ice lattice (4.5 Å), or 6.75 Å [5]. Therefore, we also use the alanine spiral skeleton as the basis and implant threonine at 3-residue intervals. The design makes the spacing of the antifreeze sites approximately 6.75Å to maximize the complementarity of the ice crystal interface and optimize the adsorption energy.

Ice lattice matching visualization
Figure 3.1 Regulation of antifreeze site spacing using PyMOL (the figure shows Cα)
Ice lattice matching visualization
Figure 3.2 Adjusted final spacing (the figure shows Cα)
Ice lattice matching visualization
Figure 4. Simplified process of designing strategies 1 and 2 from scratch

Strategy 3 AI generates a new skeleton
RFdiffusion2 was used to generate a new skeleton from scratch to explore the design space beyond natural proteins [6], and subsequent Denovo5 and Denovo6 were generated using this method. Unified optimization process.

At this point, all the skeleton structures generated by the de novo design strategy have subsequently entered the unified inverse folding optimization program. Sequence design was performed by ProteinMPNN, and key functional residues were fixed during optimization while the remaining sites were optimized for stability and solubility.

Results and Analysis

Track 1: We selected the loci with a score threshold greater than 1 to obtain mutants with no more than 3 loci for four protein mimicking mutations, and successfully constructed them in different chassis strains of E. coli and yeast.

ESM-1v mutation simulation results
Figure 5. Simulation results of ESM-1v mutation

Track 2 & 3: Through the inverse folding optimization process, we generated no less than 300 candidate sequences for each template, and successfully obtained and output 10 high-quality antifreeze protein design variants that were validated in multiple dimensions, which were then verified by wet experiments. Among them, 3WP9-IF, 3ULT-IF, 4NU2-IF and 6A8K-IF were obtained by the inverse folding optimization pipeline, Denovo1 and Denovo2 were obtained by spiral periodic functional arrays, Denovo3 and Denovo4 were obtained by ice lattice matching functional arrays, and Denovo5 and Denovo6 were generated by AI (the sequences of the 10 output proteins and some candidate sequences will be placed in the supplementary file).

Blastp sequence alignment
We first sequence aligned the protein sequences obtained with ProteinMPNN defolding, which was done using Blastp.

Sequence comparison

3WP9-IF                                                                                3ULT-IF


Sequence comparison

4NU2-IF                                                                                6A8K-IF


Figure 6. Sequence comparison
Structural Reliability Analysis (AlphaFold2)

The sequence consistency between the four variants and the original protein was less than 55%, and the sequence similarity was less than 70%, indicating that the sequence of the variant obtained by defolding was large and the original sequence was far from the original sequence, and it had strong novelty. Structural reliability analysis of AlphaFold2

We conducted a rigorous folding reliability evaluation of 10 design variants using a comprehensive structural biology analysis method.

AlphaFold2 structure reliability

3WP9-IF                                                                                3ULT-IF


AlphaFold2 structure reliability

4NU2-IF                                                                                6A8K-IF


AlphaFold2 structure reliability

Denovo1                                                                                Denovo2


AlphaFold2 structure reliability

Denovo3                                                                                Denovo4


AlphaFold2 structure reliability

Denovo5                                                                                Denovo6


Figure 7.1 pLDDT

Most of the regional confidence levels of the inverse fold variant and the de novo design variant are above 90, and the local low confidence area is mostly a flexible region, and each variant shows high confidence as a whole. The predicted structure is very similar to the actual structure, which lays a good foundation for stability testing and molecular dynamics simulation.

AlphaFold2 structure reliability

3WP9-IF                                                                                3ULT-IF


AlphaFold2 structure reliability

4NU2-IF                                                                                6A8K-IF


AlphaFold2 structure reliability

Denovo1                                                                                Denovo2


AlphaFold2 structure reliability

Denovo3                                                                                Denovo4


AlphaFold2 structure reliability

Denovo5                                                                                Denovo6


Figure 7.2 PAE

We analyzed the prediction alignment error plots provided by AlphaFold Server and found that the PAE plots of variants other than 3ULT-IF and Denovo5 roughly showed diagonally distributed dark green squares with clear outlines, indicating that each variant had one or more stable domains and the relative positions between internal residues were very stable. The PAE plots of 3ULT-IF and Denovo5 showed some light green plates, indicating that there were some disordered regions in their sequences, which could not form stable folds, but we could still further verify their frost resistance through wet experiments.

Comprehensive Performance Analysis

We quantitatively evaluated and compared key performance indicators for all variants.

AlphaFold2 structure reliability
Table 1 Key calculation evaluation indicators of output proteins

Stability improvement: ΔΔG of all variants is less than 0, and the inverse folding strategy has obvious advantages in stability optimization.

Significant functional activity: all variants showed high frost resistance probability, and the predicted frost resistance activity of rational design variants was greater than 0.8, which verified the effectiveness of physical rule design.

Good solubility: All variants have high solubility, which provides a good foundation for frost resistance. Gromacs molecular dynamics simulation.

This section will not be demonstrated first, and a separate module will be opened for detailed explanation.

Comparison of Design Strategy Effectiveness

Both the point mutation optimization and the inverse folding optimization strategies show the most reliable stability improvement effect. This result is in line with expectations, as both strategies are based on an optimized framework for native proteins, maximizing the preservation of evolutionarily tested stable structural cores. The four inverse fold variants also showed high solubility and frost resistance probability, and the protein crude enzyme solution of the mutants of the two strategies was verified by wet experiments, proving the robust optimization ability of these proteins under the premise of maintaining function.

The rational design strategy performed well in terms of frost resistance, especially the variant Denovo3 based on the ice crystal matching principle obtained a predicted frost resistance probability of 0.960. This result verifies the effectiveness of the functional locus arrangement design based on physical rules, and its excellent performance in core functional indicators makes it the first choice for function-oriented applications.

The AI-generated design strategy is unique in terms of structural novelty, and the resulting variants have an unnatural folding pattern, providing a new structural template for the antifreeze protein family, which has important basic research value. Among them, the Denovo5 and Denovo6 variants have good indicators, which are worthy of further in-depth study.

Conclusion

The three-track design strategy proved highly successful. Point mutation and inverse folding optimization are reliable for stable variants, rational de novo design shows great functional potential, and AI generation offers structural innovation. All output molecules have been computationally verified, providing a solid foundation for wet experiments.

Education
Description

This antifreeze protein (AFP) prediction model presents an end-to-end computational framework for predicting protein function directly from sequence data. It employs a two-stage architecture combining a pre-trained protein language model for feature extraction with a bespoke deep learning classifier for accurate functional annotation.

We constructed a dataset comprising 920 AFPs and 9493 non-AFPs. A comprehensive search was conducted in the UniProtKB database until January 24, 2025 using specific keywords, resulting in the collection of 6589 AFPs. Next, the maximal pairwise sequence identity of the proteins in the manually inspected dataset was culled to ≤40% using CD-HIT, yielding a set of 920 unique AFPs.

AFP dataset curation pipeline overview
Figure 1. AFP dataset curation pipeline overview

The negative dataset was derived from 9493 seed proteins of Pfam protein families that are not associated with antifreeze proteins, which is widely used for evaluating the performance of AFPs prediction methods.

Balance dataset: The dataset was divided into training and test sets, 644 AFPs and 644 non-AFPs were randomly selected as positive and negative samples to form the training dataset, and the remaining 276 AFPs and 8849 non-AFPs were designated as the test dataset.

Imbalanced dataset: To further validate the predictive performance and facilitate subsequent research, we introduced an imbalanced dataset. This dataset was divided with 70% AFPs and non-AFPs for training and the remaining 30% for independent testing respectively.

The model is built upon the ProtT5-XL-UniRef50 encoder, a transformer-based model pre-trained on the UniRef50 database, to generate high-dimensional feature representations (1024 dimensions per amino acid) from input protein sequences. These rich embeddings capture complex semantic and syntactic patterns within the protein sequences.

Figure 2 The Transformer-model architecture
Figure 2. The Transformer-model architecture
Figure 3 Feature extraction overview of ProtT5
Figure 3. Feature extraction overview of ProtT5

The subsequent classifier, named igemTJModel, utilizes a sophisticated multi-modal neural network architecture to process these features. It integrates 1D convolutional layers (1D-CNN) to identify local amino acid motifs, bidirectional LSTM (Bi-LSTM) layers to capture long-range contextual dependencies across the sequence, and an attention mechanism to dynamically weigh the importance of specific residues, significantly enhancing model interpretability. Regularization strategies like Layer Normalization, Dropout, and residual connections are incorporated to ensure training stability and prevent overfitting.

Figure 4 The igemTJmodel architecture
Figure 4. The igemTJmodel architecture
Evaluation Metrics and Results
  1. Accuracy (ACC) — Measures the proportion of correct predictions among all predictions.
    $$ {ACC} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} $$
  2. Matthews Correlation Coefficient (MCC) — A balanced measure that considers all four confusion matrix categories, reliable even when classes are of very different sizes.
    $$\text{MCC} = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$
  3. Precision — Measures the proportion of correctly predicted positive observations among all predicted positives.
    $$\text{Precision} = \frac{TP}{TP + FP}$$
  4. Area Under the ROC Curve (AUC) — The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR). The AUC measures the model's ranking ability.
    $$\text{TPR} = \frac{TP}{TP + FN}, \quad \text{FPR} = \frac{FP}{FP + TN}$$
    $$ \displaystyle AUC = \int_{0}^{1} TPR(FPR) \, dFPR $$
  5. Area Under the Precision-Recall Curve (AUPR / AP) — Especially useful for imbalanced datasets.
    $$Recall = \frac{TP}{TP + FN}, \quad Precision = \frac{TP}{TP + FP}$$
    $$ \displaystyle AUPR = \int_{0}^{1} Precision(Recall) \, dRecall $$
Figure 5 Evaluation results
Figure 5. Evaluation results

Additionally, a user-friendly web application is built on Streamlit for real-time prediction and hypothesis testing by researchers. This tool provides a powerful, high-performance, and interpretable platform for accelerating AFP discovery and analysis.

For more methods on installation and usage, please visit the README at https://gitlab.igem.org/2025/software-tools/tianjin. Here we demonstrates how to use the page interactions to predict antifreeze protein probability after locally installing and training our model.

Demonstration Video

Expert

Industrial
Construction and Cleaning of the Dataset

Figure 1 Data processing workflow for cold-tolerant proteins
Figure 1. Data processing workflow for cold-tolerant proteins
We obtained 33,882 sequences and species data by searching for keywords such as "Antifreeze protein" in UniProt. We conducted an initial screening of the data, applying filters based on species and sequence length. The filtering process was divided into pre-filtering and precise filtering.

Pre-filtering within the group was performed using the k-mer algorithm. Taking two sequences as an example, after generating k-mers, the Jaccard similarity was calculated.
$$ K_1 = \{ kmer_{1,1}, kmer_{1,2}, \ldots, kmer_{1,m} \},\quad K_2 = \{ kmer_{2,1}, kmer_{2,2}, \ldots, kmer_{2,n} \} $$
$$ J(\text{Sequence}_1, \text{Sequence}_2) = J(K_1, K_2) = \frac{|K_1 \cap K_2|}{|K_1 \cup K_2|} $$
This simplifies the complex sequence alignment problem into a set operation. The k-mer method is first used for rapid screening, followed by precise alignment only on sequences with high similarity. Then, the Bio.Align module is employed for global sequence alignment.


After obtaining 7,178 data entries, we continued to clean the data. We used the CKSAAP algorithm from AFP-LSE for data cleaning, aiming to remove non-AFP protein data. Here, we applied the principles of the CKSAAP algorithm, drawing inspiration from the AFP-LSE model, and defined the amino acid group:
AA = {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V} (20 amino acids in total; other amino acids are not included in the statistics).

The CKSAAP feature vector V is a 400-dimensional vector (v₁, v₂, ..., v₄₀₀), where vᵢ represents the frequency of the i-th amino acid pair (e.g., A-A, A-R, ..., V-V) appearing with a spacing of 8 positions in the sequence.

The team's research found that the full version of AFP-LSE mandatorily depends on the TensorFlow deep learning framework, which is its primary limitation. Based on this, the team developed a lightweight model that is more user-friendly for Windows systems.

The mathematical principle of the lightweight model is based on the assumption that the CKSAAP feature distribution of AFPs has greater variance, enabling binary classification through a variance threshold.

$$ \sigma^2 = \frac{1}{400} \sum_{i=1}^{400} (v_i - \mu)^2 $$
For the results, we only retained data with a confidence level above 95%, resulting in 6,557 entries.


Identification and Analysis of Repetitive Motifs

Next, we drew inspiration from HHrepID to screen for repetitive motifs. Protein sequences containing intrinsic repeat units exhibit significantly higher local self-similarity compared to random sequences.

The model quantifies this self-similarity using normalized scores from local sequence alignments. Given a protein sequence S (with length L), the detection process is as follows:

  1. Step 1: Candidate Motif Generation
    Starting from the N-terminus of the sequence, a series of candidate repetitive motifs Mk of increasing lengths are systematically generated.
    $$ M_k = S[1:k],\quad k \in [k_{\min}, k_{\max}] $$
  2. Step 2: Self-Alignment and Score Calculation
    For each candidate motif Mk, the Smith-Waterman local alignment algorithm is used to align it against the full length of sequence S, calculating an optimal alignment score A(S,Mk). The scoring function is as follows:
    $$ A(S, M_k) = \max_{\text{alignments}} \left( \sum \text{(match score)} - \sum \text{(mismatch penalty)} - \sum \text{(gap penalty)} \right) $$
  3. Step 3: Significance Threshold Determination
    To distinguish true repetitive signals from random background noise, an empirical significance score threshold θ is set. A candidate motif is identified as a valid repetitive motif if and only if its alignment score exceeds this threshold.
    $$ \text{Repeat Motif} = \{ M_k \mid A(S, M_k) > \theta \} $$

It was specified that the repetitive motifs must consist of the 20 standard amino acids. A total of 5,164 repetitive motifs were successfully screened.

Phylogenetic Analysis and Selection of Characteristic Sequences

Next, we proceeded to construct an evolutionary tree. We first performed a phylogenetic analysis of the sequences using MEGA software. In initial attempts, we exported a distance matrix, hoping to screen for similar AFP proteins based on genetic distance. However, this method proved unsatisfactory, as the distance matrix failed to clearly reflect the evolutionary relationships between sequences, resulting in ambiguous classification boundaries and making it difficult to effectively distinguish between functionally similar subtypes.

Figure 2 Classification based on distance matrix
Figure 2. Classification based on distance matrix

After discussion, the team concluded that relying solely on genetic distance for screening was overly dependent on pre-set thresholds and could not adequately capture the topological evolutionary information between sequences, potentially missing key phylogenetic signals. Consequently, we adjusted our strategy, shifting focus to the overall phylogenetic tree constructed. By analyzing the clustering patterns of leaf nodes and the branching structure, we could more intuitively identify protein groups with a shared evolutionary history. Based on this approach, we selected 19 representative subtrees from the complete phylogenetic tree for subsequent analysis, based on node stability and within-cluster homology.

Figure.3 Statistical Analysis of Amino Acid Frequency
Figure.3 Statistical Analysis of Amino Acid Frequency

We selected 19 subtrees from them, constructed and exported their NWK files. After text processing, the files were converted to FASTA format and uploaded to WebLogo for visualization.

Figure.4 Partial Repeat Motif Visualisation via WebLogo
Figure.4 Partial Repeat Motif Visualisation via WebLogo

The WebLogo plots illustrate the amino acid conservation within the repetitive motifs. The screened short peptides will be further validated in subsequent experiments.

Government

Government: Policy Communication

To explore ways to translate the outcomes of our experiment into practical applications, we took the initiative to step out of the laboratory and engage in dialogues with staff from relevant government departments and professors from law schools. These discussions aimed to explore the compliance pathways and social value of our research in two key areas: “heating antifreeze” and “cosmetic repair”.

Dialogue with Government Heating Departments
The Potential of Antifreeze Proteins in Environmental-Friendly Antifreezing

From interviews with staff at government heating management departments, we learned that current mainstream antifreeze measures for heating in China still rely heavily on industrial salts (e.g., sodium chloride, calcium chloride) and ethylene glycol-based chemical antifreeze agents. According to the “2023 Statistical Yearbook on Urban and Rural Construction”, the total centralized heating area nationwide reached 14.324 billion m² by the end of 2023. Based on typical industry parameters, the estimated usage of ethylene glycol during a single heating cycle is approximately 43,000 tons.

However, ethylene glycol is a toxic compound. If ingested in large quantities by humans, its metabolites can cause kidney crystallization and kidney function damage. Furthermore, if such substances leak into the environment through pipelines, they can lead to soil compaction and water pollution, posing significant ecological risks. Chloride-based antifreeze agents also present severe problems.

The U.S. Environmental Protection Agency (EPA) stated in its ’Environmental Risk Assessment Report on Winter Road Deicing Agents ’ that chloride-based snow-melting agents are widely used due to their low cost and high ice-melting efficiency. According to the report, the cost of environmental corrosion caused by chlorides in the U.S. can account for 4% of the Gross National Product (GNP). Over 100,000 bridges nationwide suffer from structural damage, with repair costs ranging from 78 billion to 112 billion U.S. dollars. A survey conducted in Copenhagen, Denmark, revealed that 50% of 102 bridges exhibited severe steel bar corrosion, with the use of chloride-based snow-melting agents identified as the primary cause.

In contrast, antifreeze proteins produced via synthetic biology technology offer advantages such as “biodegradability” and “excellent environmental compatibility”, without causing long-term pollution. Although the current production cost of biological antifreeze proteins remains relatively high, they already demonstrate certain economic viability and large-scale production potential compared to naturally extracted products. As technology matures and production processes are optimized, their costs are expected to decrease further. In the long run, the use of biological antifreeze proteins will not only reduce environmental governance expenditures but also deliver favorable comprehensive economic benefits.

Online conversation with government department staff
Figure 1. Online conversation with government department staff
Interview with Professor Mi
Figure 2. Interview with Professor Mi of Tianjin University Law School
Dialogue with Professor Mi Wei (Law School)
Legal Boundaries for Experimental Innovation and Product Commercialization

We firmly believe that while continuously innovating and iterating experimental products, we must also uphold respect for legal regulations. When we approached Tianjin University Law School with our preliminary conceptualization of antifreeze protein technology, we sought more than just a compliance check—we aimed to find a solid social foundation for our project. Unexpectedly, this dialogue became a core driver for the comprehensive upgrading of our project.

In-depth exchanges with Professor Mi Wei made us realize that the commercialization of synthetic biology products is a journey built on three pillars: “safety”, “evidence”, and “compliance”. We clarified the positioning of antifreeze proteins as "new cosmetic ingredients," and their filing process must be centered on full-chain, traceable safety data. Appealing functional claims such as "repair" and "soothing" are contingent on meeting rigid evidence requirements, including human efficacy evaluation tests. Professor pointed out that regulators are establishing a refined assessment framework for synthetic biology-derived ingredients, covering aspects such as strain traceability and the complexity of metabolic products at the source.

During the conversation, we recognized that an excellent business plan must not stop at describing market prospects—it must prioritize "regulatory compliance." Based on our discussions with the professor, we conducted a "compliance restructuring" of our business plan:

  • We transformed the vague category of "policy and regulatory risks" into specific, assessable, and plannable issues, such as "uncertainty in the filing cycle for new ingredients," "failure to obtain evidence for efficacy claims," and "environmental leakage of genetically engineered strains."
  • In the SWOT analysis, we added "establishing full-chain safety evidence" as a core Strength (S), and identified "responding to rapidly changing regulatory policies" as a Weakness (W) that must be addressed. Corresponding to this Weakness, the Opportunity-Weakness (OW) strategy explicitly outlined a concrete plan to "engage regulatory expert consultants."

Legal guidance ultimately fed back into the most fundamental aspect: experimental design. To address the "strain traceability" and "biosafety" requirements emphasized by Professor Mi, we implemented two key innovations in our wet experiments:

  1. Systematic Strain Traceability and Archive Establishment: We no longer viewed the EBY100 yeast strain merely as a tool, but instead established a comprehensive "identity file" for it. This file details the strain’s origin, genetic background, and all gene-editing operations, laying a solid foundation for the "strain information traceability" required in future new ingredient applications.
  2. Biosafety Experimental Testing: To proactively assess and mitigate potential risks, we took the initiative to design a "biocompatibility experimental verification" process. This preliminary safety test is not only intended to meet compliance requirements but also to identify and control biosafety risks in the early stages of product development, providing preliminary data support for subsequent toxicological evaluations.
Biocompatibility experimental data
Figure 3. Biocompatibility experimental data

These regulatory requirements have driven us to establish a full-chain safety assessment system during the experimental phase, covering strain traceability, impurity control, and sensitization testing. This ensures that our technology does not cross legal boundaries when it is commercialized.

Just like the secretion process of proteins, the Government module is not an end point, but a starting point for releasing the project’s value to society.

Conclusion: Policy Guides Innovation, Regulations Safeguard Commercialization

Through dialogues with government authorities and legal experts, we have gained a profound insight: the success of a synthetic biology project depends not only on technological breakthroughs but also on understanding policies and arranging compliance measures. We will continue to advance the project through four-dimensional collaboration—research, education, industry, and government—to enable antifreeze proteins to play a role in broader fields and fulfill our mission of "technology for good."

References

[1] Yeh Y, Feeney R E. Antifreeze proteins: structures and mechanisms of function[J]. Chemical reviews, 1996, 96(2): 601-618.
[2] O'Neil K T, DeGrado W F. A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids[J]. Science, 1990, 250(4981): 646-651.
[3] Zhang N, Du Y T, Yao P Q, et al. Synergistic effect of hyperactive antifreeze protein on inhibition of gas-hydrate growth by hydrophobic and hydrophilic groups[J]. The Journal of Physical Chemistry B, 2023, 127(49): 10469-10477.
[4] Shaoli C U I, Weijia Z, Xueguang S, et al. Revealing the Effect of Threonine on the Binding Ability of Antifreeze Proteins with Ice Crystals by Free-energy Calculations[J]. CHEMICAL JOURNAL OF CHINESE UNIVERSITIES-CHINESE, 2022, 43(3): 97-103.
[5] Zhang X, Yang J, Tian Y, et al. Precise de novo Design Principle of Antifreeze Peptides[J]. Journal of the American Chemical Society, 2025, 147(21): 17682-17688.
[6] Watson J L, Juergens D, Bennett N R, et al. De novo design of protein structure and function with RFdiffusion[J]. Nature, 2023, 620(7976): 1089-1100.