Engineering

Overview

To fill the data gap when antimicrobial peptides and deep learning are integrated, we propose iterative updates to our database in the following areas:

SPADE Database Establishment

Building a comprehensive, standardized antimicrobial peptide database by integrating multiple data sources with rigorous quality control and unified formatting.

RAG (Retrieval-Augmented Generation) System Construction

Developing an intelligent retrieval system that combines semantic search with generative AI to provide context-aware access to AMP knowledge.

AOMM Model Training

Training a multi-task neural network model capable of predicting multiple antimicrobial peptide properties simultaneously for comprehensive peptide characterization.

SPADE Database Establishment

Building a database is a critical but complex task. High-quality datasets are a prerequisite for training high-performance models. In this chapter we will show how we build and optimize the database step by step.

DBTL Navigator

Click to navigate

Cycle 1: Motivation for Establishing the Database

Design

Antimicrobial Peptide (AMP) has key edges over antibiotics: they act on bacterial cell membranes (lowering resistance risk), target broad microbes (bacteria, fungi, viruses), and are less toxic to human cells with minimal gut microbiome disruption. Traditional AMP discovery is laborious and low-yielding, but deep learning (DL) solves this by using large datasets to build predictive models. DL tools like RNNs, transformers, and CNNs accurately predict new AMP activity, optimize their structure for better potency/stability, and reveal hidden interaction patterns (e.g., peptide charge vs. Gram-negative bacteria activity), slashing research costs and speeding development. Therefore, based on the data science background of team members and the dependence of deep learning technology on data, we are ready to search the existing antimicrobial peptide database for standardization and unification.

Build

After extensive searching, we pre-selected 6 antimicrobial peptide databases:

1. Uniprot
UniProt Consortium. (2019). UniProt: a worldwide hub of protein knowledge. Nucleic acids research, 47(D1), D506-D515.

2. LAMP
Zhao, X., Wu, H., Lu, H., Li, G., & Huang, Q. (2013). LAMP: A Database Linking Antimicrobial Peptides. PloS one, 8(6), e66557. https://doi.org/10.1371/journal.pone.006655

3. APD3
Wang, G., Li, X., & Wang, Z. (2016). APD3: the antimicrobial peptide database as a tool for research and education. Nucleic acids research, 44(D1), D1087–D1093. https://doi.org/10.1093/nar/gkv1278

4. DBAASP
Pirtskhalava, M., Amstrong, A. A., Grigolava, M., Chubinidze, M., Alimbarashvili, E., Vishnepolsky, B., Gabrielian, A., Rosenthal, A., Hurt, D. E., & Tartakovsky, M. (2021). DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic acids research, 49(D1), D288–D297. https://doi.org/10.1093/nar/gkaa991

5. DRAMP
Shi, G., Kang, X., Dong, F., Liu, Y., Zhu, N., Hu, Y., Xu, H., Lao, X., & Zheng, H. (2022). DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic acids research, 50(D1), D488–D496. https://doi.org/10.1093/nar/gkab651

6. dbAMP
Yao, L., Guan, J., Xie, P., Chung, C. R., Zhao, Z., Dong, D., Guo, Y., Zhang, W., Deng, J., Pang, Y., Liu, Y., Peng, Y., Horng, J. T., Chiang, Y. C., & Lee, T. Y. (2025). dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era. Nucleic acids research, 53(D1), D364–D376. https://doi.org/10.1093/nar/gkae1019

Test

For the DBAASP databases, we downloaded all its data directly through the API interface. For the LAMP database, we downloaded the XML file from github, and we split it and convert it into JSON file. However, the remaining databases do not provide offline databases, that is, offline file downloads of all data. Only the download files of the single-letter sequences of antimicrobial peptides are provided, which is far from enough.

Learn

After contacting the respective database organizations via email for help, we encountered obstacles everywhere, which meant that we had to use some special methods to obtain the data.

DBTL Navigator

Click to navigate

Cycle 2: Data Obtaining

Design

Web crawlers offer significant advantages for data collection: they can efficiently retrieve large amounts of data from multiple sources, replacing time-consuming manual extraction and minimizing human error. They also enable regular updates to maintain data timeliness and completeness, which is crucial for research requiring consistent information. Because the remaining antimicrobial peptide data is scattered across various specialized databases, manual collection is inefficient and poses data inconsistency risks. To address this issue, we plan to build a web crawler specifically designed to extract and integrate data from different AMP databases to support our subsequent AMP-related research.

Build

Based on Python's selenium library, we simulated Chrome browser requests (the code is shown below) to build a crawler. And based on the URL structure of the database, we determined the crawling target.

service = Service(executable_path=CHROME_DRIVER_PATH)
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)

driver = webdriver.Chrome(service=service, options=options)

Test

After crawling the remaining databases, we successfully crawled data from APD3 and DRAMP. We then converted the obtained HTML files into JSON files. Because of the huge amount of data in Uniprot, we could not use search filters to ensure that all the downloaded data were antimicrobial peptides. Therefore, we had to abandon the Uniprot database. At the same time, since the crawler cannot effectively access the dbAMP database, we also discarded it.

Learn

Since these data come from different databases, their data structures are different. We unified their data structures. Finally, we removed duplicates based on the sequence attribute and finally obtained more than 39,000 data items. When merging data from different databases, we found that some properties of the same antimicrobial peptides had different values.

DBTL Navigator

Click to navigate

Cycle 3: Data unification and cleaning workflow

Design

In order to improve the quality of the data, we need to unify the data after cleaning it. In the previous cycle, we found the problem of data conflict after unifying the data format. To deal with this issue, we first checked the data in the references to take them as the most credible data. For antimicrobial peptides without references, we designed some algorithms to unify their properties.

Build

We first screened out antimicrobial peptides with a large amount of missing data. Based on the Python Biopython library and the KD scale ^[1], we built an algorithm that can calculate molecular weight, isoelectric point, network charge, and hydrophobicity (the code is shown below) to calculate partial properties of antimicrobial peptides with missing references when data conflict.

Reference

[1] Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. Journal of molecular biology, 157(1), 105-132.

Peptide Property Calculation Algorithm

from Bio.SeqUtils.ProtParam import ProteinAnalysis
from Bio.SeqUtils import molecular_weight
import json 
import numpy as np

# 定义氨基酸分类
AMINO_ACIDS = 'ACDEFGHIKLMNPQRSTVWY'
HYDROPHOBIC_AAS = {'A', 'V', 'I', 'L', 'M', 'F', 'C'}  # 疏水氨基酸
POLAR_AAS = {'S', 'T', 'N', 'Q', 'C', 'H'}                  # 极性氨基酸
BASIC_AAS = {'K', 'R', 'H'}                                 # 碱性氨基酸
ACIDIC_AAS = {'D', 'E'}                                     # 酸性氨基酸

# KD疏水性量表 (单位: kcal/mol)
KD_HYDROPATHY = {
    'A': 1.8, 'R': -4.5, 'N': -3.5, 'D': -3.5, 'C': 2.5,
    'Q': -3.5, 'E': -3.5, 'G': -0.4, 'H': -3.2, 'I': 4.5,
    'L': 3.80, 'K': -3.9, 'M': 1.9, 'F': 2.8, 'P': -1.6,
    'S': -0.80, 'T': -0.70, 'W': -0.9, 'Y': -1.3, 'V': 4.2
}

def calculate_peptide_properties(sequence):
    """计算抗菌肽的综合性质"""
    # 检查序列有效性
    sequence = sequence.upper()
    if not all(aa in AMINO_ACIDS for aa in sequence):
        invalid_aas = set(sequence) - set(AMINO_ACIDS)
        raise ValueError(f"Invalid amino acids found: {invalid_aas}")
    
    analysis = ProteinAnalysis(sequence)
    
    
    # 3. Molecular Mass
    mass = molecular_weight(sequence, seq_type='protein')
    
    # 4. Isoelectric Point (pI)
    pI = analysis.isoelectric_point()    
    # 6. Net Charge at pH 7
    net_charge = 0
    for aa in sequence:
        if aa in BASIC_AAS:  # 碱性氨基酸 (pHpKa时带负电)
            net_charge -= 1
    # N端氨基 (+1) 和 C端羧基 (-1) 在pH=7时的贡献
    net_charge += 1 - 1
    
    # 7. Hydrophobicity (KD)
    ww_scores = [KD_HYDROPATHY.get(aa, 0) for aa in sequence]
    mean_hydrophobicity = sum(ww_scores) / len(sequence)
    
    return {
        'Sequence': sequence,
        'Length': len(sequence),
        'Mass': round(mass, 2),
        'PI': round(pI, 2),
        'Net Charge': net_charge,
        'Hydrophobicity': round(mean_hydrophobicity, 2),
    }

Test

By comparing with known data, our algorithm shows extremely high accuracy.

Learn

After data unification, our data quality has been significantly improved. In addition, the team also designed a unique identifier system to generate a unique SPADE ID for each peptide in the database (using U and UN suffixes to distinguish natural and non-natural antimicrobial peptides) to ensure the uniqueness and traceability of the data. However, the format of the json file makes the data observation less intuitive and user-friendly. Therefore, we also need to use some visualization techniques to present the data in a more intuitive way.

DBTL Navigator

Click to navigate

Cycle 4: Construction of online SPADE database website

Design

To achieve user-friendly standards, we set the following goals:

Goal 1

Have a good human-computer interaction experience

Goal 2

System with language conversion

Goal 3

There is a system for efficiently screening antimicrobial peptides

Build

To meet the diverse needs of antimicrobial peptide research, the SPADE platform has designed powerful multi-dimensional information retrieval and display capabilities. Users are no longer limited to single sequence or name searches; instead, they can combine multiple criteria for precise screening, accurately locating antimicrobial peptides that meet specific research objectives. These screening criteria cover various aspects of antimicrobial peptide research, including but not limited to:

Search Criteria Features

Users can select to search for antimicrobial peptides with specific biological activities, such as antibacterial, antiviral, antifungal, anticancer, and anti-biofilm activities.

The platform allows users to filter based on the target organisms that the antimicrobial peptides act on. For example, one can search for peptides with inhibitory activity against specific pathogens such as Staphylococcus aureus, Escherichia coli, or Candida albicans.

Users can filter by specific physicochemical properties such as molecular weight, isoelectric point, net charge, and hydrophobicity. This is crucial for designing peptides with specific cell membrane penetration or stability.

The platform also supports searches based on secondary structures (such as α-helix, β-sheet) or specific structural motifs.

Language Support

The database supports five languages:

1. English
2. Chinese
3. Japanese
4. Spanish
5. German

Test

Database Homepage

The homepage of the database is shown below:

Fig.1. Database homepage display

Peptide Entry Display

The first antimicrobial peptide in the database is shown below:

Fig.2. Antimicrobial peptide entry display

Learn

The SPADE database platform now has the powerful search function mentioned above. However, when encountering complex semantic search tasks, common search functions seem to be inadequate.

RAG (Retrieval-Augmented Generation) System Construction

The black-box environment of neural networks is very helpful in solving complex semantic tasks, which provides ideas for building our efficient retrieval system.

DBTL Navigator

Click to navigate

Cycle 5: RAG System Exploration

Design

Retrieval-Augmented Generation (RAG) delivers notable advantages in tackling complex semantic retrieval tasks for databases, addressing key limitations of traditional methods. Unlike conventional keyword-based database searches— which rely on exact matches and struggle with vague, context-rich queries (e.g., "find product lines where post-purchase issues increased alongside shipping delays in Q3")—RAG integrates large language models (LLMs) with real-time database retrieval. As the best-performing Embedding model released in June 2025, Qwen3-Embedding-4B is very suitable for complex semantic retrieval tasks of the SPADE database.

Build

We first used a Python script to convert each antimicrobial peptide into a fixed string form (an example is shown below).

Peptide String Format Example

[Peptide Name] Variacin (Bacteriocin)
[Source] Micrococcus varians (Gram-positive bacteria)
[Family] Belongs to the lantibiotic family (Class I bacteriocin)
[Sequence] GSGVIPTISHECHMNSFQFVFTCCS
[Sequence Length] 25
[Protein Existence] Protein level
[Biological Activity] 1. Antimicrobial 2. Antibacterial 3. Anti-Gram+
[Target Organism] Gram-positive bacteria:Lactobacillus helveticus, L. bulgaricus, Lactobacillus lactis, Lactobacillus delbrueckii, Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus sake (LSK), Lactobacillus curvatus, Leuconostoc mesenteroides, Streptococcus thermophilus, Lactococcus lactis (SL2), Enterococcus faecalis, Enterococcus faecium, Listeria innocua, Listeria monocytogenes, Listeria welhia. Note:Inhibitory activity tested with supernatant adjusted to pH 7.
[Linear/Cyclic] Linear
[Stereochemistry] L
[Formula] C118H175N31O36S4
[Mass] 2732.1
[PI] 5.98
[Net Charge] 1
[Hydrophobicity] 0.45
[Half Life] Mammalian:30 hourYeast:>20 hourE.coli:>10 hour
[Function] Has a broad host range of inhibition against Gram-positive food spoilage bacteria. Variacin is resistant to heat and pH conditions from 2 to 10.
[Biophysicochemical properties] Variacin is resistant to heat and pH conditions from 2 to 10.
[Literature]
 1. [Title]: Variacin, a new lanthionine-containing bacteriocin produced by Micrococcus varians comparison to lacticin 481 of Lactococcus lactis.; [Pubmed ID]: 8633879; [Reference]: Appl Environ Microbiol. 1996 May;62(5)1799-1802.; [Author]: Pridmore D, Rekhif N, Pittet AC, Suri B, Mollet B.; [URL]: http://www.ncbi.nlm.nih.gov/pubmed/?term=8633879
[Frequent Amino Acids] SCF
[Absent Amino Acids] ADKLORUWY
[Basic Residues] 2
[Acidic Residues] 1
[Hydrophobic Residues] 9
[Polar Residues] 15
[Positive Residues] 2
[Negative Residues] 1

After embedding the data of each antimicrobial peptide using the Qwen3-Embedding-4B model, the embedding vector was managed using the faiss library. After the user instructions and embedding vectors are input into Qwen3-Reranker-4B, the model automatically outputs the best matching antimicrobial peptide and matching score.

Test

We used many possible instructions to test the RAG model. The output of the model shows a high accuracy. We give the following two examples.

Example 1: Multi-Condition Query

Query: "Find peptides effective against E. coli with low hemolytic activity"

Model Output:

Rank 1: SPADE_UN_22728 Score: 2.18
Rank 2: SPADE_UN_23160 Score: 1.57
Rank 3: SPADE_UN_23159 Score: 1.55

Example 2: Specific Property Query

Query: "Searching for antimicrobial peptides with a half-life greater than 10 hours in Escherichia coli"

Model Output:

Rank 1: SPADE_N_03315 Score: 2.66
Rank 2: SPADE_N_02277 Score: 1.92
Rank 3: SPADE_UN_00680 Score: 1.73

Note: The score is a measure of the correlation between a single antimicrobial peptide and the query.

Learn

Integrating a RAG system with the antimicrobial peptide (AMP) database brings impactful advantages for targeted retrieval. It excels at parsing complex, multi-condition queries—such as "peptides effective against E. coli with low hemolytic activity" or "AMPs with half-life >10 hours in Escherichia coli"—which traditional keyword searches often struggle to interpret accurately.

The RAG system delivers structured, ranked results (e.g., SPADE_UN_22728 as Rank 1 for the first query, SPADE_N_03315 for the second) paired with quantifiable correlation scores (e.g., 2.18, 2.66). This lets researchers quickly identify the most relevant AMPs without sifting through irrelevant data. By linking semantic understanding to database content, it streamlines AMP screening workflows, ensures data-driven selections, and boosts efficiency—critical for applications like antimicrobial drug development.

AOMM Model Training

After retrieving the high-quality SPADE antimicrobial peptide database and an efficient retrieval system, we need to use the data from our database to train a high-performance antimicrobial peptide-related neural network.

DBTL Navigator

Click to navigate

Cycle 6: Replicate Existing Work to Gain Experience

Design

We found a work on antimicrobial peptide evolution by analyzing neural network gradients¹. This work use CNN, LSTM, Tranformer and Attention four networks for the two tasks of AMP classification and minimum inhibitory concentration(MIC) regression. The gradients of these four neural networks are then used to guide the evolution of antimicrobial peptides.

¹ Wang, B., Lin, P., Zhong, Y., Tan, X., Shen, Y., Huang, Y., ... & Wu, Y. (2025). Explainable deep learning and virtual evolution identifies antimicrobial peptides with activity against multidrug-resistant human pathogens. Nature Microbiology, 10(2), 332-347.

Build

Based on the codes and data shown in the github, we carried out the work of reproducing.

Test

After training these four networks, we tested the performance of the networks and restored the model performance described in the article.

Learn

However, based on our accumulated biological knowledge, we know that the criteria for measuring the quality of an antimicrobial peptide are not only activity but also half-life and toxicity, etc. Furthermore, the minimum inhibitory concentration of antimicrobial peptides is specific to experimental conditions and target microorganisms. But this work ignores both types of specificity through averaging.

DBTL Navigator

Click to navigate

Cycle 7: Training Dataset

Design

Based on our analysis of the flaws in the above article, we held a meeting to discuss and conceive of directions for project improvement. We are targeting the evaluation of antimicrobial peptides and plan to use neural networks to comprehensively evaluate individual antimicrobial peptides. Before this, we need to build training data for the neural network based on the SPADE database.

Fig. 3. Photo of the whiteboard during the project establishment meeting

Build

We retained all antimicrobial peptides with lengths between 1 and 100 in the SPADE database. We matched the non-antimicrobial peptides from the Uniprot database using keywords and used them together with the antimicrobial peptides as data for the AMP classification task. We extracted and cleaned the bioactivity label data from SPADE database for the bioactivity classification task. For task half-life regression, we selected in vitro experimental data as labels (unit: min) and the hosts were all mammalian cells. Because both lethality and concentration are important for analyzing the hemolytic activity of antimicrobial peptides, we fused these two indicators into a hemolytic activity score as the regression target of the hemolytic activity regression task. For task MIC regression, There are 5824 AMPs containing mic_regression labels. An antimicrobial peptide may have multiple target organisms and their corresponding MIC values. We set the high threshold 100000 μg/ml and the low threshold 0.001 μg/ml to clip the original MIC values. We retained microorganisms that appeared more than 100 times.

Test

At the beginning, we used the original data to train the model directly, and the accuracy of the model was very low.

Learn

The range of the original data is too large, so we should normalize the data before training the model. After establishing a high-quality training dataset, we need a suitable neural network architecture as the target network for training. The neural network need to complete 5 tasks：1. AMP classification 2. bioactivity classification 3. MIC regression 4. half-life regression 5. hemolysis regression

DBTL Navigator

Click to navigate

Cycle 8: Model Architecture Exploration

Design

The architecture of the model to solve the AMP-oriented multitasks must be encoder-decoder architecture type. Therefore, the selection of encoder and decoder models becomes particularly critical. At the beginning of designing the model architecture, we choose ESM2 the model trained by Facebook AI Research because of its excellent performance in protein-related work.

Build

We built a model architecture with ESM2 150M as encoder and simple MLP (three linear layers in average and elu function as activation function) as decoder. And we set the output dimensions of decoder for each task as follows: amp_classification: 1, bioactivity_classification: 13, mic_regression: 1, half_life_regression: 1 and hemolysis_regression: 1. Finally, the neural network was trained in the order of the tasks above. In order to make a model compatible with all tasks, we use the EWC(Elastic Weight Consolidation) mechanism as a mechanism to prevent the model from forgetting.

Test

We finish the training process and test the model. The output value of the model on each task fluctuates around a certain number, which is obviously wrong. We then tested the ESM2 650M's ability to characterize antimicrobial peptides on the ESM 650M and found that the ESM2's ability to characterize antimicrobial peptides was limited. An example is shown in the figure below. The amino acid masked by the mask token should be V. However, V was not among the top five amino acids predicted by the model.

Fig. 4. Example of the ESM2's ability to characterize antimicrobial peptides

Learn

We are going to rebuild the encoder using the Bert architecture and pre-train it using our dataset. At the same time, increase the depth of the decoder's neural network. On reflection, we found that it was unrealistic to integrate all tasks into a single set of parameters.

DBTL Navigator

Click to navigate

Cycle 9: AOMM Was Born

Design

We designed the network from scratch. The specific hyperparameters are detailed in the text below. We choose rotation position encoding as the position encoding of the attention mechanism. The hidden layer dimension is set to a larger 768 to allow the network to learn richer features in shorter sequences. The depth of the encoder is set to 18 and the averger number of the MLP linear layers is 4.

{
  "_name_or_path": "muskwff/amp4multitask_124M",
  "architectures": [
    "AMPForMultiTask"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "mask_token_id": 32,
  "max_position_embeddings": 128,
  "model_type": "amp",
  "num_attention_heads": 24,
  "num_hidden_layers": 18,
  "output_hidden_states": true,
  "pad_token_id": 1,
  "position_embedding_type": "rotary",
  "rotary_emb_dim": 32,
  "token_dropout": true,
  "torch_dtype": "float32",
  "transformers_version": "4.52.4",
  "type_vocab_size": 2,
  "use_cache": true,
  "use_residual_connections": true,
  "vocab_list": null,
  "vocab_size": 33,
  "auto_map": {
    "AutoConfig": "modeling_amp.AMPConfig",
    "AutoModel": "modeling_amp.AMPMultiTasksPreTrainedModel"
  }
}

Build

We selected the mask training task as the pre-training task and then performed the same training task as the previous DBTL cycle. In order to pursue a more stable training process, we set the output dimension corresponding to the task AMP classification to 2 and use cross entropy as the loss function. We abandoned the EWC mechanism and froze the first 5 layers of the encoder after pre-training as shared parameters for all tasks.

Test

In the masked_lm task, the model significantly outperforms the ESM2 150M baseline, achieving a top-1 accuracy of 0.8736 and a top-5 accuracy of 0.9407, compared to ESM2's 0.3300 and 0.6392, respectively.

For AMP classification, the model attains an outstanding AUC score of 0.9951 and an F1 score of 0.9644, indicating strong discriminative ability between antimicrobial and non-antimicrobial peptides.

In the MIC regression task, the model achieves an overall Pearson correlation coefficient of 0.7000 across 18,668 samples. It also shows robust organism-specific predictive performance, with particularly high correlations for Bacillus subtilis (0.8109), Klebsiella pneumoniae (0.7687), and Enterococcus faecalis (0.7450).

For hemolysis regression, the model yields a very low mean absolute error (MAE) of 0.1061, reflecting accurate prediction of hemolytic activity.

Finally, in half-life regression, the model demonstrates nearly perfect predictive capability with a Pearson correlation of 0.9851, underscoring its strength in estimating peptide stability in mammalian systems.

Learn

We used early stopping during training and automatically saved the best model for each task. Finally, the best models for all tasks were combined into a single model file (suffixed with pth) with a size of 1.4GB. The model file was uploaded to the Hugging Face model hub and can be accessed via the link: https://huggingface.co/muskwff/amp4multitask_124M.

DBTL Navigator

Click to navigate

Collaborative Experience & Contributions to the Community

Design

Identifying Regional Community Needs, Structuring an Academic Exchange Framework

Addressing the pain point of insufficient deep offline connections among iGEMers in the Jiangsu-Zhejiang-Shanghai region, the core objective was defined as evolving from event participants to organizers. The workflow for internal venue PPT presentations and an external venue Academic Marketplace was planned, ensuring the event possessed both academic value and community-building significance.

Build

Implementing the Dual-Scenario Exchange Platform, Activating Cross-Team Connections

1. Structured Academic Presentation System in the Internal Venue

Organized more than 20 teams to present their projects via PPT, covering core aspects such as experimental design, technical route optimization, and bottleneck solution strategies. Standardized academic presentation templates were provided to facilitate the in-depth communication of research ideas.

2. Academic Marketplace Scenario in the External Venue

A project poster exhibition area was set up, guiding participants to engage in discussions on specific technical issues like module design logic and sample processing protocols. This facilitated the transformation of online collaboration IDs into offline academic partners, strengthening direct connections between teams.

3. Implicit Resource Connection Channels

"Project Value and Needs Discussion" sessions were incorporated within the exchange activities, laying the groundwork for future resource complementarity and sustained academic support, and initially forming the operational basis for an academic community.

Test

Multi-dimensional Evaluation of Exchange Effectiveness, Validating Community-Building Value

Thematic discussions promoted a shift in participants' understanding of synthetic biology from theoretical concepts to practical research, deepening their comprehension of the discipline's research paradigms, while also expanding the academic influence of the competition within the region. The transformation of participant identity from "competitor" to "collaborator," along with the formation of preliminary collaborative links post-event, and the positive feedback received, confirmed the necessity of building such a platform.

Learn

Consolidating Community Operation Experience, Optimizing Long-Term Collaboration Mechanisms

Key Insight

It was recognized that establishing a collaborative atmosphere based on shared interests and goals through a one-off event is crucial for community sustainability, requiring further enhancement in the precision of needs matching.

Iteration Direction

Standardize the effective components of this event, such as "Project Value Discussion" and "Technical Problem Brainstorming," and establish regular offline/online exchange mechanisms to promote the transition from "temporary collaboration" to "long-term resource complementarity."

Value Extension

Focusing on the core mission of "Synthetic Biology Serving Society," future activities can be designed around specific themes addressing environmental or community needs, laying a foundation for subsequent exchanges among other teams within the community.

Overview

SPADE Database Establishment

RAG (Retrieval-Augmented Generation) System Construction

AOMM Model Training

SPADE Database Establishment

DBTL Navigator

Cycle 1: Motivation for Establishing the Database

Design

Build

Test

Learn

DBTL Navigator

Cycle 2: Data Obtaining

Design

Build

Test

Learn

DBTL Navigator

Cycle 3: Data unification and cleaning workflow

Reference

Peptide Property Calculation Algorithm

DBTL Navigator

Cycle 4: Construction of online SPADE database website

Goal 1

Goal 2

Goal 3

Search Criteria Features

1. Activity Type

2. Target Organism

3. Physicochemical Properties

4. Structural Features

Language Support

Database Homepage

Peptide Entry Display

RAG (Retrieval-Augmented Generation) System Construction

DBTL Navigator

Cycle 5: RAG System Exploration

Peptide String Format Example

Example 1: Multi-Condition Query

Example 2: Specific Property Query

AOMM Model Training

DBTL Navigator

Cycle 6: Replicate Existing Work to Gain Experience

DBTL Navigator

Cycle 7: Training Dataset

DBTL Navigator

Cycle 8: Model Architecture Exploration

DBTL Navigator

Cycle 9: AOMM Was Born

DBTL Navigator

Collaborative Experience & Contributions to the Community

Design

Identifying Regional Community Needs, Structuring an Academic Exchange Framework

Build

Implementing the Dual-Scenario Exchange Platform, Activating Cross-Team Connections

1. Structured Academic Presentation System in the Internal Venue

2. Academic Marketplace Scenario in the External Venue

3. Implicit Resource Connection Channels

Test

Multi-dimensional Evaluation of Exchange Effectiveness, Validating Community-Building Value

Learn

Consolidating Community Operation Experience, Optimizing Long-Term Collaboration Mechanisms

Key Insight

Iteration Direction

Value Extension