Abstract

Antimicrobial peptide (AMP) represents a critical class of natural molecules with broad-spectrum antimicrobial activity. The integration of deep learning technologies offers a powerful approach to accelerate the discovery and design of novel AMPs by leveraging pattern recognition in complex sequence data. Despite this potential, there remains a lack of comprehensive, large-scale, and systematically curated AMP databases tailored for training robust neural network models. To bridge this gap, we present a Systematic Platform for Antimicrobial peptide Database with Evaluation (SPADE) that integrates data from four AMP databases (APD3, DRAMP, DBAASP and LAMP) after systematic processing. The database contains more than 39,000 natural and modified peptides.

Then, after data screening, cleaning and unification of wet lab conditions, we released the dataset AMP-Oriented Six tasks (AMPOS) with the AMP-oriented Multi-Property Prediction Task (AMPPT), which includes five subtasks: sequence mask prediction, AMP classification, half-life regression, minimum inhibitory concentration regression, hemolytic activity score regression. More important, for this task, we design a novel AMP-Oriented Multi-task Model (AOMM), which demonstrates state-of-the-art performance on every AMPPT subtask.

In addition, we build a Retrieval-Augmented Generation(RAG) system for SPADE database to deal with Researchers' complex AMP search needs.

Resources

SPADE is available at: https://xjtlu-spade.netlify.app/
AMPOS is available at: https://huggingface.co/datasets/muskwff/AMP_six_tasks
AOMM is available at: https://huggingface.co/muskwff/amp4multitask_124M
The codes are available at: https://gitlab.igem.org/2025/software-tools/xjtlu-software

What is SPADE?

SPADE is structured as a three-layer research ecosystem:

Overview of SPADE workflow

1. Data Foundation

Integration of over tens of thousands of AMP records from major public repositories.
Systematic cleaning and normalization, including unit harmonization for minimum inhibitory concentration (MIC) values and standardized activity annotations.
Automated pipelines for continuous literature tracking and real-time database updates.
Global access acceleration to ensure usability across different regions.

2. Predictive Model – AMP-Oriented Multi-Task Model (AOMM)

For training neural network models, we create a AMP-Oriented Six Tasks dataset (AMPOS) via organizing data from SPADE database.
A multitask deep learning framework designed to generate a comprehensive in silico evaluation for novel sequences.
Provides quantitative MIC prediction, multiple functional activity probabilities (e.g., antibacterial, antifungal, anticancer), safety assessment through hemolysis risk scoring, and stability estimation based on predicted half-life.
Functions as a "pre-screening" tool that reduces experimental burden and guides laboratory prioritization.

3. Semantic Retrieval Engine – RAG System

A retrieval-augmented generation (RAG) architecture tailored for AMP research.
Supports natural-language queries and domain-specific semantic understanding. For example, the system can process queries such as: "Identify peptides effective against MRSA, with low hemolysis risk, and fewer than 20 amino acids."
Returns results with relevance scoring and direct links to experimentally validated literature data.
Ensures that retrieval outcomes are both contextually meaningful and scientifically reliable.

Why do we need SPADE?

The discovery and optimization of antimicrobial peptides have long been hindered by several critical challenges in data accessibility, standardization, and predictive modeling. Existing AMP databases are often fragmented, inconsistently annotated, and lack the computational readiness required for modern deep learning applications. Moreover, most current predictive models focus on a single property—such as antimicrobial activity—while overlooking other essential characteristics like toxicity, stability, and hemolytic activity. This narrow focus limits their practical utility in guiding experimental design and prioritization.

SPADE addresses these gaps by providing a unified, high-quality, and machine-learning-ready repository that integrates and standardizes AMP data from multiple authoritative sources. By harmonizing experimental conditions, unifying units of measurement, and enriching metadata, SPADE enables more accurate and reproducible model training. Furthermore, the accompanying multitask prediction framework-AOMM offers a holistic in silico profiling tool that simultaneously evaluates multiple functional and safety properties of AMPs—a capability absent in prior works.

Through its three-layer architecture—data foundation, predictive modeling, and semantic retrieval—SPADE supports end-to-end AMP research, from data exploration and model training to intelligent querying and hypothesis generation. It empowers researchers to efficiently navigate the complex AMP landscape, prioritize candidate peptides with balanced properties, and accelerate the development of novel therapeutics with enhanced efficacy and safety profiles.

SPADE is not merely a database—it is a comprehensive ecosystem designed to bridge the gap between data-driven discovery and wet-lab validation, fostering a new era of rational and multi-property-aware antimicrobial peptide design.

Inspiration for the Project

After the formation of XJTLU-Software 2025, we read an article about using neural networks to guide the evolution of antimicrobial peptides and were very interested in it.^[1]

[1] Wang, B., Lin, P., Zhong, Y., Tan, X., Shen, Y., Huang, Y., ... & Wu, Y. (2025). Explainable deep learning and virtual evolution identifies antimicrobial peptides with activity against multidrug-resistant human pathogens. Nature Microbiology, 10(2), 332-347.

This article uses the AMP-CLIP model to accurately identify antimicrobial peptides and the AMP-READ model to predict the minimum inhibitory concentration of antimicrobial peptides. Based on the gradient interpretability of AMP-READ, the peptide sequence is iteratively optimized through "gradient descent + projection".

After analysis and reproducibling models in this paper, we found several flaws in this article:

The criteria for measuring an antimicrobial peptide are not only related to activity. We also need antimicrobial peptides to have low toxicity and high stability.
The article averaged all MIC values corresponding to an antimicrobial peptide to obtain training data. However, the minimum inhibitory concentration of each antimicrobial peptide is specific to the target microorganism and the specificity of the action environment (physicochemical environment), which was not fully considered in this article.

This article shows us the great potential of deep learning in antimicrobial peptide research, but it also exposes the shortcomings of current methods. We hope to stand on the shoulders of our predecessors and address these deficiencies. Therefore, to address these shortcomings, we need to collect data and design a model that can more comprehensively evaluate an antimicrobial peptide to lay the foundation for the evolution of antimicrobial peptides.

Our Engagement with the World

Throughout the project cycle, our team believed that SPADE should not only serve as a technical tool, but also as a bridge connecting science with society.

Community and Collaboration

We actively reached out beyond our campus, bringing our work into conversations with students, researchers, and the wider public. During the iGEM Jiangsu–Zhejiang–Shanghai Exchange Conference, we presented our progress and listened closely to other teams' challenges, sharing real struggles such as "How do you explain synthetic biology to someone who has never heard of it?" or "What do you do when your data doesn't make sense?" These open conversations inspired us to rethink how our platform could be more intuitive and collaborative.

Conferences and Visibility

At the CCiC event in Beijing, we introduced SPADE to a national audience and received valuable feedback from experts who encouraged us to expand its application beyond AMPs. Meeting experts and peers face-to-face gave us valuable feedback — some asked whether our system could be expanded to cover other peptides, while others were curious about the real-time update function. These questions helped us refine our priorities. In addition, we won the 3rd prize at the 2025 CCIC International Conference, which gave us the opportunity to present our project to a larger audience.

Online Outreach

Beyond academic circles, we wanted to make synthetic biology and AMPs more approachable. By opening accounts on platforms such as Xiaohongshu, Bilibili, Instagram, and Xiaoyuzhou, we regularly shared behind-the-scenes updates and stories about our project. Our aim was simple:

Show people what antimicrobial peptides are and why they matter.
Make synthetic biology less intimidating and more relatable.
Reach potential end-users who might one day benefit from AMP-based solutions.

Many of these posts sparked unexpected conversations with followers who had never heard of synthetic biology before but were eager to learn more. These interactions taught us that engagement is not just about broadcasting information, but about building genuine two-way dialogue where curiosity flows both ways. Through these experiences, we realized that our project carries meaning not only in the scientific domain, but also in helping people see how synthetic biology could one day impact their everyday lives.

Outlooks and Future Perspectives

Looking ahead, we envision SPADE as more than a finished product — it is the starting point of a long-term scientific roadmap. One of our next goals is to construct an evolutionary tree of antimicrobial peptides, using the standardized data in SPADE to trace their origins, diversification, and functional patterns across species. By combining this evolutionary perspective with our predictive models, we aim to identify not only promising new sequences but also the evolutionary logic that makes certain peptides more effective or safer. This direction opens the possibility of designing synthetic peptides that inherit beneficial traits from natural lineages while avoiding harmful ones.

To bring these computational insights closer to reality, we plan to collaborate with experimental laboratories that can validate the most promising predictions. Such collaborations will create a genuine cycle of in silico design and in vitro testing, ensuring that our work contributes directly to the discovery of clinically relevant candidates. Beyond research, we also hope to extend SPADE into a widely accessible resource. By refining the user interface, expanding tutorials, and promoting the platform within both the iGEM community and the broader scientific world, we aim to make SPADE a trusted and widely used tool for peptide research. Ultimately, our long-term vision is to establish SPADE as a global hub that not only accelerates peptide discovery and evolution studies, but also fosters collaboration between computational and experimental scientists in the shared effort to combat antimicrobial resistance.

Description