Loading...
Model

Secretion Efficiency Prediction

In synthetic biology, efficient protein secretion is critical for boosting product yield and achieving modular functionality. For E. coli, a widely used prokaryotic host, the native secretion capacity is limited, and proteins often remain trapped in the cytoplasm, leading to loss of activity or requiring extensive purification. Thus, selecting and optimizing signal peptides that can drive efficient secretion into the periplasm or medium is of high practical significance.

E. coli possesses several secretion pathways, with the Sec pathway being the dominant and most widely exploited. This pathway recognizes N-terminal signal peptides, directs unfolded polypeptides across the inner membrane, and removes the signal peptide through signal peptidase cleavage, enabling proper folding and maturation of the protein in the periplasm. Within the Sec system, signal peptides are classified into SPI type and SPII type: SPI-type signal peptides, upon cleavage, release soluble proteins, representing the canonical secretion route, whereas SPII-type signal peptides contain a conserved lipobox motif that anchors the protein to the membrane via lipid modification, preventing soluble secretion. Parallel to Sec, the Tat pathway operates through a diarginine motif (RR motif) and transports proteins that are already folded [1]. Taken together, only Sec/SPI-type signal peptides ensure an efficient soluble secretion in E. coli, making their screening and optimization essential for enhancing secretion efficiency and enabling downstream applications.

Figure 1. The Sec- and Tat-dependent protein transport pathways [2]

A canonical Sec-type signal peptide is typically composed of three distinct regions: the n-region, h-region, and c-region.

The n-region is located at the N-terminus and is often enriched in positively charged residues such as lysine and arginine, which facilitate recognition by SecA and initiation of translocation. The h-region consists of a stretch of hydrophobic amino acids that form the hydrophobic core required for insertion into the membrane channel, serving as the most characteristic feature of the signal peptide. The c-region, positioned near the cleavage site, frequently contains the conserved AXA↓ motif, which ensures precise cleavage by signal peptidase I and release of the mature protein. The combined features of these three regions not only determine whether a signal peptide is correctly recognized and transported but also directly influence secretion efficiency. Therefore, quantitative assessment of n/h/c-region properties and conserved motifs provides a critical foundation for scoring and ranking signal peptides.

Figure 2. Region structures of different types of signal peptide [3]

In this project, we established a dry-lab pipeline to systematically evaluate the secretion potential of Sec/SPI-type signal peptides in E. coli. The workflow consists of four major steps: data assembly → prescreening → feature scoring → ranking and recommendation.

sfGFP was chosen as the standard scaffold protein due to its easily detectable fluorescence, which serves as a rapid readout for comparing the secretion potential of different signal peptides and testing the pipeline logic. The design logic and workflow of this pipeline are highly reusable. Other iGEM teams can readily adapt it by substituting their protein of interest, thereby obtaining corresponding ranked lists and recommendations of signal peptides.

Figure 3. Pipeline overview diagram

Data Assembly

A total of 2020 known or putative signal peptide sequences from E. coli were collected, primarily from the Signal Peptide Website and relevant literature. For each sequence, we recorded the accession number, source organism, peptide length, and full amino acid sequence. The dataset was stored in a standardized tabular format, serving as the foundation for subsequent screening and analysis.

Then, we constructed fusion proteins for all secretory peptide candidate with sfGFP and added GSGSGS linker between the two. This flexible linker can reduce the possible structural interference between signal peptides and sfGFP, enabling us to compare the secretion potential of different signal peptides more objectively.

Pathway Classification

Each candidate signal peptide underwent a three-step screening process:

  1. SignalP prediction: SignalP [3] was used to classify each peptide (Sec/SPI, SPII, Tat, or non-SP), while also recording the predicted cleavage site and confidence score (D-score).
  2. TMHMM transmembrane auditing: TMHMM [4] was applied to check for additional transmembrane helices outside the h-region. Candidates with extra TM segments were flagged as “high-risk” and excluded.
  3. Cleavage site validation: The SignalP-predicted cleavage site was validated to ensure that it lies upstream of sfGFP, i.e., the cleavage coordinate must be shorter than signal peptide length + linker length. If the cleavage extends into the sfGFP coding region, the candidate was discarded as it would disrupt sfGFP integrity.

After these three steps, Tat-type, SPII-type, and structurally problematic sequences were eliminated, leaving a refined subset more likely to function through the Sec/SPI pathway.

Figure 4. Example output of SignalP prediction
Example SignalP 6.0 prediction for a signal peptide. The red, orange, and yellow curves indicate the probabilities of n-, h-, and c-regions, respectively, with the dashed line marking the predicted signal peptidase I cleavage site (CS). The result shows a typical Sec/SPI-type signal peptide at the N-terminus, with a high-confidence cleavage prediction at position 21.

Figure 5. Example output of TMHMM prediction
Example TMHMM prediction for a signal peptide. Purple bars represent the probability of transmembrane helices, blue for cytoplasmic (inside), and orange for periplasmic/extracellular (outside) localization. The result indicates a medium transmembrane segment at the N-terminus, while the remaining sequence is predominantly outside, highlighting the need to integrate SignalP results to avoid misclassification.

Figure 6. Distribution of SignalP Prediction
Distribution of SignalP prediction types among candidate peptides. The majority were classified as standard signal peptides (SP, 1,384), followed by lipoprotein-type (LIPO, 225), Tat-type (TAT, 81), with only a few Tat-lipoprotein (7) and pilin-type (4). This indicates that Sec/SPI candidates dominate the dataset, providing a strong foundation for subsequent analysis.

Figure 7. Probability Distribution of Sec/SPI Signal Peptides (SignalP)
Probability distribution of Sec/SPI-type signal peptides predicted by SignalP. The majority (1,125 sequences) scored near 1.0, showing strong Sec/SPI signatures, while the remaining were scattered between 0.5–0.9, representing borderline candidates. The prevalence of high-confidence predictions demonstrates the robustness of the prescreened dataset.

Sec-pathway Scoring

After removing Tat-type, SPII-type, and invalid candidates, we conducted a quantitative evaluation of the remaining Sec/SPI signal peptides. The goal was to select the best among the best. We extracted five key features for each peptide, with predefined weights and scoring methods.

Table 1. Weighted scoring framework
Feature Weight Scoring Method Biological Rationale
n-region net charge 0.1 Optimal range +1 to +2 → 1.0; <0 → 0; >2 penalized linearly Moderate positive charge promotes SecA recognition; excessive charge may hinder translocation efficiency
h-region hydrophobicity 0.1 GRAVY 1.0–1.8 → 1.0; weaker/stronger hydrophobicity penalized A balanced hydrophobic core facilitates membrane insertion; too weak/too strong destabilizes recognition
c-region cleavage motif 0.1 Motif scan for AXA↓ (A–X–A consensus near cleavage site scored as 1.0, else 0) Conserved motif ensures precise cleavage by signal peptidase I
SignalP D-score 0.5 Normalized confidence score (0–1) Reflects overall confidence of Sec/SPI classification and prediction reliability
Cleavage site probability 0.2 Directly use SignalP-reported probability (Pr) as normalized score Higher cleavage probability indicates more reliable recognition and processing

All features were normalized and combined into a composite score using the assigned weights, which provided the quantitative basis for candidate ranking and secretion potential recommendations.

Validation

After completing Sec-pathway scoring and ranking, we selected three high-ranked signal peptides and three relatively low-ranked ones as representative controls for subsequent wet-lab validation.

Table 2. Representative high- and low-ranked signal peptides selected for experimental validation
ID n-charge h-hydrophobicity c-motif D-score CS-prob Final Score
OmpA 0.5 0 1 1 0.97 0.84
ydhT 0.87 0.55 0 1 0.98 0.84
Amy 0.5 0 0 1 0.98 0.74
P25401 0.25 1 0 0.51 0.57 0.49
P13980 0.65 1 0 0.06 0.88 0.37
P06963 0.87 0 0 0.09 0.86 0.3

All selected signal peptides were fused to sfGFP and expressed in E. coli, where secretion efficiency was evaluated based on measurable fluorescence differences, directly reflecting the secretion potential of each peptide. Each strain was separately inoculated into supplement M9 medium for cultivation. When the strains grew to the logarithmic phase (exponential growth phase), IPTG was added to induce the expression of sfGFP.

After induction, the culture supernatant was collected by centrifugation. The intensity of green fluorescence in the supernatant was used as the indicator to evaluate the ability of signal peptides to secrete heterologous proteins—The stronger the fluorescence, the higher the efficiency of the corresponding signal peptide in mediating the secretion of heterologous proteins into the extracellular space.

The extracellular effects of sfGFP from different signal peptides under blue light and the visualization effect of the microplate reader. The results are verified through two detection methods: one is direct observation under a blue light lamp, which can visually distinguish the difference in fluorescence brightness in the supernatant. The second is quantitative detection by a microplate reader (as shown in the Figure 8), which can precisely quantify the fluorescence intensity value.

Figure 8. The experimental scheme of the secretion efficacy assay

The methods clearly demonstrated that there were significant differences in the fluorescence intensity of secreted fluorescent proteins mediated by different signal peptides, providing a clear basis for the subsequent functional evaluation of signal peptides.

  1. The fluorescence intensity of sfGFP in the supernatant of ompA, Amy and ydhT signal peptides was significantly higher than that of other candidates (5 to 8 times that of TRAT3), and the secretion effect was excellent. The secretion efficiency of TRAT3, FAEE and LYS2 was low, which was completely consistent with the model prediction results.
  2. Ultimately, ompA was selected as the signal peptide for subsequent chitinase secretion. Subsequent experiments confirmed that it could increase the activity of chitinase supernatant by more than 10 times (Figure 10).

Figure 9. The extracellular effects of sfGFP from different signal peptides under blue light

Figure 10. Measurement of the transport effects of different signal peptides on sfGFP by microplate reader (excitation wavelength: 485 nm, emission wavelength: 510 nm)

Molecular Docking

In our project, we aim to establish a biocontrol strategy against soybean root rot by engineering two complementary bacterial strains. One strain secretes chitinase to degrade fungal cell walls, while the other produces the plant-derived triterpenoid β-amyrin, which targets chitin synthase (CHS) to block new chitin biosynthesis. Together, these strains implement a dual mechanism of “degrading existing cell walls and preventing new wall formation.”

However, direct experimental validation of the inhibitory effect of β-amyrin on CHS is restricted by biosafety concerns, since Fusarium spp. are classified as risk group 2 organisms. Therefore, we employed molecular docking as a safe and rational approach to simulate the interaction between β-amyrin and CHS, providing theoretical support for its inhibitory potential.

As crystallographic structures of Fusarium CHS are not yet available, we applied a “heterologous homolog substitution strategy” [5]. Among homologous proteins, the CHS structure from Phytophthora sojae has been resolved by cryo-EM and deposited in the PDB. Specifically, PDB:7WJO [6] contains the complex with the known inhibitor Nikkomycin Z. Therefore, using PDB:7WJO as a surrogate receptor for docking is both rational and literature-supported. The β-amyrin structure was retrieved from PubChem [7] and converted via OpenBabel [8] for subsequent docking simulations.

Molecular docking was carried out using AutoDock Vina [9,10]. The workflow proceeded as follows:

  1. Receptor preprocessing: The 7WJO structure was processed in PyMOL [11] by removing water molecules, metal ions, and the co-crystallized inhibitor Nikkomycin Z. Hydrogen atoms and charges were then added, and the receptor was converted into .pdbqt format.
  2. Ligand preprocessing: β-amyrin was retrieved from PubChem, converted into the required format using OpenBabel, followed by hydrogen addition and charge assignment in PyMOL. The prepared ligand was saved as .pdbqt.
  3. Docking setup: The docking grid box was centered on the binding pocket corresponding to the original inhibitor Nikkomycin Z in 7WJO, ensuring a biologically relevant binding site.
  4. Docking: AutoDock Vina run docking and generated poses.

Figure 11. Molecular Docking Workflow of β-amyrin with 7WJO

In the docking results, β-amyrin exhibited stable binding poses within the active pocket of chitin synthas. AutoDock Vina generated nine candidate conformations with binding affinities ranging from −7.9 to −8.9 kcal/mol. The small energy difference among these poses suggests that β-amyrin may adopt multiple feasible orientations within the binding site. The best-scoring pose (mode 1, −8.9 kcal/mol) was well aligned with the active pocket.

Table 3. Docking output from AutoDock Vina
Mode Affinity (kcal/mol) RMSD l.b. RMSD u.b.
1 -8.9 0.000 0.000
2 -8.5 5.969 8.672
3 -8.4 4.919 6.702
4 -8.4 5.376 7.639
5 -8.1 3.203 8.160
6 -8.0 2.162 4.185
7 -8.0 2.629 8.448
8 -8.0 3.485 7.845
9 -7.9 3.014 8.136

    The structural visualization showed that the ligand (orange) was embedded in the active pocket of chitin synthase, forming interactions with multiple key residues (yellow):

  • Hydrogen bonding: Stable hydrogen bonds (green dashed lines) were observed between the ligand and residues GLU241 and LYS303.
  • Hydrophobic and aromatic interactions: The aromatic ring of the ligand engaged in π–π stacking or hydrophobic contacts with aromatic residues such as TRP539, thereby enhancing the affinity of the binding pocket. Hydrophobic residues including LEU493, VAL452, and PRO454 surrounded the ligand scaffold, providing a hydrophobic environment that further stabilized the binding.
  • Polar/amino acid network: Polar residues such as THR237 and ASP382 were located at the pocket edge, potentially contributing to ligand stabilization through dipole interactions or water-mediated hydrogen bonds.

This network of hydrogen bonds, hydrophobic interactions, and aromatic stacking is consistent with the binding mode of potent inhibitors. The observed binding affinity (−8.9 kcal/mol), in agreement with the interaction profile, suggests that this molecule has potential as a chitin synthase inhibitor and may interfere with fungal cell wall biosynthesis.

Figure 12. Docking visualization of β-amyrin with CHS
The left panel shows the overall CHS structure with β-amyrin (orange), while the right panel highlights the binding pocket, where interactions with key residues (GLU241, THR237, ASP382, TRP539, LEU493) are illustrated.

References

[1]
Natale P, Brüser T, Driessen AJM. Sec- and Tat-mediated protein secretion across the bacterial cytoplasmic membrane: distinct translocases and mechanisms. Biochim Biophys Acta Biomembr. 2008;1778(9):1735-1756.
[2]
Frain KM, Robinson C, van Dijl JM. Transport of folded proteins by the Tat system. Protein J. 2019;38(4):377-388.
[3]
Teufel F, Almagro Armenteros JJ, Johansen AR, et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022;40(7):1023-1025.
[4]
Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305(3):567-580.
[5]
R V, Granada D, Skariyachan S, et al. In vitro and in silico investigation deciphering novel antifungal activity of endophyte Bacillus velezensis CBMB205 against Fusarium oxysporum. Sci Rep. 2025;15:684.
[6]
Chen W, Cao P, Liu Y, et al. Structural basis for directional chitin biosynthesis. Nature. 2022;610(7932):402-408.
[7]
Kim S, Chen J, Cheng T, et al. PubChem 2025 update. Nucleic Acids Res. 2024;53(D1):D1516-D1525.
[8]
O’Boyle NM, Banck M, James CA, et al. Open Babel: an open chemical toolbox. J Cheminform. 2011;3:33.
[9]
Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: new docking methods, expanded force field, and Python bindings. J Chem Inf Model. 2021;61(8):3891-3898.
[10]
Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455-461.
[11]
Schrödinger L, DeLano W. PyMOL. 2020. Available from: http://www.pymol.org/pymol