Project Objective
- Solve the problems of scattered data and inconsistent formats in the antimicrobial peptide database, and establish a high-quality unified data set.
- Develop the SPADE screening platform to achieve precise screening of multi-dimensional (toxicity, stability, target organisms, etc.) antimicrobial peptides.
- Optimize the data structure to facilitate the input of machine learning models and intelligent analysis.
Data Design
- Select mainstream databases: LAMP, APD3, DRAMP, DBAASP.
- Unified data format: Use the DRAMP data structure as the standard template.
- Key field optimization: "Hemolytic Activity" and "Target Organism" are converted from nested dictionaries to flat strings or structured lists.
- Design a unique identification system: Generate a SPADE ID for each peptide segment to ensure data traceability.
Platform Design
- Front-end: User-friendly interface, supporting multi-dimensional filter condition input.
- Back end: Supports efficient data query, API call and batch processing.
- Database: Supports structured storage and dynamic expansion. Later, it will support the integration of UniProt and PubMed information.