Introduction(PAC-Analysis)
Explore Software→To support our project on biopesticidal cyclic lipopeptides, we created a Python-based bioinformatics software PAC-Analysis (Phylogenetic and COM Domain Analysis) that provides comprehensive solutions for lipopeptide genetic engineering, serving as an open-source tool and database for iGEM teams and future research communities.
How Does the Software Work?
Input Preparation
Users must provide a FASTA format file containing five Pps (similar to Fengycin, usually contains five gene clusters) gene sequences from their target strain as input. The file should meet the following specifications:
PAC-Analysis provides a concise command-line interface. Users can obtain detailed usage guidelines by running:
$ PAC-Analysis -h
Modules
We used the following libraries and tools in our script,such as BioPython, Numpy, Matplotlib, etc.
Multiple sequence alignment (--auto/--localpair modes) for evolutionary and conservation analysis
ML phylogenetic trees (-m MF/-bb) for inferring evolutionary relationship
Multiple sequence alignment (--full --force) for high-throughput COM domain conservation analysis
Phylogenetic Tree Construction
PAC-Analysis first integrates user-provided Pps sequence files with our collected Bacillus Pps sequence database. It then performs multiple sequence alignment through the MAFFT module to ensure accurate sequence alignment. Based on the alignment results, it utilizes the IQ-TREE module to construct a maximum likelihood phylogenetic tree, generates standard contree files, and displays evolutionary relationships through a built-in visualization tool, revealing the precise positioning of target sequences within the phylogenetic tree.
Annotation and Structure Prediction
PAC-Analysis automatically integrates and standardizes the five input Pps sequence files, then invokes the InterProScan module for in-depth scanning of the complete sequences. The system accurately identifies and annotates various functional domains (including core domains such as C, A, T, and TE), generating detailed domain distribution maps to provide a foundation for subsequent functional analysis.
COM Domain Analysis
Finally, by targeting the key functional interfaces of NRPS synthetases, PAC-Analysis automatically extracts COM domain sequences at the junctions between modules—covering 30 amino acid residues flanking each domain interface. Through high-precision multiple sequence alignment, combined with phylogenetic analysis and domain annotation results, it comprehensively infers the functional origins, evolutionary differentiation, and conserved characteristics of COM regions, providing critical evidence for rational design.
Feedback
Upon completion of analysis and recommendation, the system initiates a self-improving cycle to continuously enhance its recommendation algorithms and knowledge base. By collecting user-implemented modification strategies and experimental validation results (such as inhibition zone assay reports or quantitative yield data), it automatically archives successful cases into a high-priority recommendation database. Based on feedback regarding bioactivity data, the system dynamically adjusts the recommendation weights of parameters such as promoter strength and regulatory gene efficacy in subsequent analyses. This enables data-driven iterative optimization, progressively improving the success rate and reliability of genetic engineering strategies.
Contribution to Future iGEM Teams
PAC-Analysis is designed to provide iGEM teams with a rational design solution from genes to functions, thereby enhancing the efficiency and depth of project research. It enables automated sequence analysis to rapidly identify key functional domains and regulatory genes, guiding CRISPR target screening and metabolic optimization. By integrating structural prediction and machine learning strategies, it facilitates the precise reconstruction of NRPS synthetases, assists teams in efficiently constructing high-performance engineered strains, and strengthens the project's computational biology foundation and innovation.
Future and Prospect
We believe that our software serves as an indispensable component of project development. In the future, we will continue to iteratively optimize its analytical and predictive capabilities, further integrating multi-omics data and artificial intelligence methods to achieve precise design of more complex metabolic pathways and dynamic simulation of regulatory mechanisms.
Key expansions include:
1) Establishing phenotype-gene networks using our antimicrobial and yield literature database for intelligent regulatory recommendations.
2) Systematically expanding the Bacillus Pps gene repository with regular updates to improve analytical accuracy.
3) Extending platform compatibility to Surfactin and Iturin lipopeptides through specialized detection modules.
This tool can not only be extended to the engineering research of other secondary metabolites,but it also aims to lower the barrier to synthetic biology research. By providing efficient, reliable, and open computational support for more iGEM teams, we collectively advance synthetic biology into a new era of data-driven and intelligent design.
Source code: https://gitlab.igem.org/2025/software-tools/hbut-china
References
[1]Front. Microbiol., 09 December 2016Sec. Terrestrial Microbiology https://doi.org/10.3389/fmicb.2016.0180
[2]Translocation of the thioesterase domain for the redesign of plipastatin synthetase https://doi.org/10.1038/srep38467
[3]Module and individual domain deletions of NRPS to produce plipastatin derivatives in Bacillus subtilis. https://doi.org/10.1186/s12934-018-0929-4
[4]Translocation of subunit PpsE in plipastatin synthase and synthesis of novel lipopeptides https://doi.org/10.1016/j.synbio.2022.09.001
[5]Katoh, Rozewicki, Yamada 2019 (Briefings in Bioinformatics 20:1160-1166) MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. (explains online service)
[6]Bui Quang Minh,Heiko A Schmidt,Olga Chernomor,Dominik Schrempf,Michael D Woodhams,Arndt von Haeseler,Robert Lanfear,IQ-TREE https://doi.org/10.1093/molbev/msaa015