Our dry lab constructed an end-to-end antimicrobial peptide intelligent development platform, CytoFlow, covering the entire process from molecular design, activity prediction, sequence optimization to production processes.
It consists of three major models: CytoEvolve (antimicrobial peptide evolution model, inputs AMP sequences and outputs improved variants), CytoGuard (antimicrobial peptide activity evaluation model, inputs AMP sequences and outputs MIC), and CytoGrow. The system framework is shown below:

CytoGuard and CytoEvolve can be considered models designed to improve LL-37 antimicrobial peptide quality.
CytoGrow, on the other hand, is a model aimed at increasing LL-37 yield, consisting of three major models: Grow-Medium (medium composition optimization model), Grow-Yeast (Saccharomyces cerevisiae growth kinetics model), and Grow-Glucose (glucose consumption model).
Grow-Medium establishes a hybrid intelligent optimization framework of quadratic response surface + Gaussian process residuals + dual acquisition function Bayesian optimization for optimizing the culture medium formulation for Saccharomyces cerevisiae. We used two methods for optimization. The Mean method yielded an improved medium composition of glucose 54.49 g/L, peptone 9.82 g/L, KH2PO4 3 g/L, with predicted corresponding OD value of 0.408 (20.9% improvement over the basic medium result OD=0.3375), showing small improvement but high reliability. The UCB method predicted glucose 41.39 g/L, peptone 23.58 g/L, KH2PO4 3 g/L conditions to achieve OD value 0.424 (25.6% improvement over the basic medium result OD=0.3375), showing larger improvement but requiring further experimental validation.
Dataset Description
Optimization Objective
Find the optimal concentration combination of glucose, peptone, and potassium dihydrogen phosphate to maximize OD value.
Initially, we attempted a hybrid modeling approach of quadratic response surface + Gaussian process residuals.
Quadratic Response Surface (Trend Term)
where feature matrix contains:
Parameter estimation: , where is the regularization parameter.
Gaussian Process Residual Modeling (Rasmussen & Williams, 2006)
Residual definition:
Kernel function: Anisotropic RBF kernel
Hyperparameter settings:
Prediction Formulas
Mean prediction:
Variance prediction:
For the initial optimization strategy, we used grid search, first defining the search space:
For the second method, we set the acquisition function: Upper Confidence Bound (UCB)
, where (99% confidence)
Results
Mean argmax @ (G=54.33, T=9.80, K=3): OD_mean=0.408 UCB argmax @ (G=41.33, T=23.60, K=3): OD_mean=0.424
Through grid search, both Mean and UCB found optimized configurations and predicted corresponding OD values.
The figures below show visualization of two extrema predicted by Mean and comparison of biomass(OD value) across different medium compositions:


The following figures show model validation and analysis: sensitivity analysis of the three parameters and model fitting quality analysis:


Through hybrid modeling combining parametric (quadratic response surface) and non-parametric (GP) methods, and dual acquisition functions considering both conservative optimistic (UCB) and deterministic exploitation (Mean) strategies, we optimized existing medium compositions and predicted their OD values. Optimal formulation discovered: Glucose 41.4g/L + Peptone 23.6g/L + KH2PO4 3g/L, predicted OD 0.424, 25.6% improvement over best experimental observation. Through modeling and computational prediction, we provided clear medium formulation recommendations for subsequent experiments, reducing trial-and-error costs. Moreover, this method is generalizable—the established optimization framework can be extended to other microbial medium optimization problems.
To monitor Saccharomyces cerevisiae growth and obtain biomass at any time point, we established S. cerevisiae growth kinetics models using Logistic and Gompertz models. The Logistic model showed the best fitting performance for biomass data (R²=0.9937, RMSE=0.3462).
Data Source
The project uses standardized yeast fermentation experimental data, including:
Considering practical situations, wet lab experiments on S. cerevisiae growth cannot be precise to every minute and second. However, in practical applications, such as calculating S. cerevisiae efficiency ratios, we may need S. cerevisiae biomass at different time points. Being able to obtain S. cerevisiae biomass at any moment becomes particularly important. Therefore, our dry lab designed the Grow-Yeast model, using limited data to transform discrete points into continuous curves for obtaining biomass at different time points.
Biomass Growth Models
Logistic Growth Model (Verhulst, 1838)
The Logistic model describes biological growth limited by environmental resistance:
where: - : Biomass at time t - : Initial biomass - : Maximum biomass - : Maximum specific growth rate
Characteristics: S-shaped growth curve, suitable for describing complete growth processes
Gompertz Growth Model (Gompertz, 1825)
The Gompertz model is suitable for describing processes with gradually declining growth rates:
Characteristics: Asymmetric S-shaped curve with more gradual growth rate decline
Data Preprocessing
Sample standard deviation:
Data quality assessment:
Data completeness check and outlier identification
Phase division:
Visualization of Raw Experimental Data
This figure uses dual Y-axis design, simultaneously displaying biomass growth (OD₆₀₀, green) and glucose consumption (red) over time. Shaded regions identify different fermentation phases, clearly showing the transition between exponential growth and stationary phases.

Experimental Results
The figure below shows the fitting performance of Logistic and Gompertz models on biomass data, including:

The figure below shows fitting using the best-performing Logistic model:

Model validation:

To monitor glucose consumption during S. cerevisiae growth and obtain glucose remaining at any time, we established glucose consumption kinetics models using modified exponential decay and Logistic decay models. The modified exponential decay model showed the best fitting performance for biomass data (R²=0.955).
Data Source
The project uses standardized yeast fermentation experimental data, including:
Problem Description
Similar to Grow-Yeast, we hope to obtain glucose remaining at any time point. Glucose remaining and when glucose is depleted are crucial for our project, as S. cerevisiae only begins producing LL-37 after glucose depletion. Therefore, our dry lab designed the Grow-Glucose model, using limited data to transform discrete points into continuous curves for obtaining glucose remaining at different time points.
1. Modified Exponential Decay Model
Considering different consumption rate differences in fermentation phases:
where: - : Initial substrate concentration - : Early consumption rate constant - : Late consumption rate constant - : Transition time point
2. Logistic Decay Model
Describing S-shaped characteristics of substrate consumption:
where: - : Minimum substrate concentration - : Consumption rate constant - : Shape parameter
The figure below shows fitting performance of modified exponential decay and Logistic decay models on glucose consumption:

The figure below shows fitting using the best-performing modified exponential decay model:

CytoGuard is an innovative deep learning framework specifically designed to predict the biological activity of Antimicrobial Peptides (AMPs). This model integrates feature representations from multiple pre-trained protein language models (ESM-2, Ankh, ProtT5) and captures high-order structural dependencies in sequences through Hypergraph Neural Networks (HGNNs). The model employs dynamic k-mer selection mechanisms and attention fusion strategies, achieving excellent performance on the test set: Spearman correlation coefficient of 0.8543, Pearson correlation coefficient of 0.9105, RMSE of 0.1806, and R² of 0.8153.
Antimicrobial peptides, as essential components of the innate immune system, have tremendous potential in combating bacterial resistance (Hancock & Sahl, 2006; Mahlapuu et al., 2016), with LL-37, as the only human-derived antimicrobial peptide, holding exceptional research potential. What properties does LL-37 possess, and how can we evaluate whether this is a "good" antimicrobial peptide? Traditional methods naturally involve constructing antimicrobial peptide expression systems, from strain selection, cultivation to separation and purification, or direct chemical synthesis. The obtained antimicrobial peptides then require antimicrobial activity determination through inhibition zone experiments or dilution plating methods. Evaluating other physicochemical properties requires even more experiments. Traditional experiments are time-consuming, labor-intensive, and costly (Fjell et al., 2012). In today's era of rapid computational development, can we design a computational pipeline to learn from existing antimicrobial peptide data and predict the activity and physicochemical properties of unseen antimicrobial peptides? The answer is affirmative, but existing machine learning methods face the following challenges:
To address these challenges and efficiently and accurately predict the properties of unknown antimicrobial peptides, we designed the CytoGuard antimicrobial peptide activity prediction model.
Process Flow

Given an antimicrobial peptide sequence , where is the sequence length, we extract features using three renowned pre-trained protein language models (Rives et al., 2021; Lin et al., 2023; Elnaggar et al., 2022):
ESM-2 Embedding:
Ankh Embedding:
ProtT5 Embedding:
where , , .
Through feature extraction with protein large language models, we achieve high-dimensional feature extraction of antimicrobial peptides, with each model aligned in dimensions for subsequent data processing.
Before feature extraction, we fine-tune on a 10K deduplicated antimicrobial peptide dataset to improve model performance on antimicrobial peptides. We also tested non-fine-tuned models; see the Experiment section for comparison.
We employ an attention mechanism (Bahdanau et al., 2015; Vaswani et al., 2017) to fuse multiple embedding representations:
Projection Layer:
Attention Weight Calculation:
Fused Features:
For a given value, we construct hypergraph (Feng et al., 2021):
Node Set: , corresponding to each position in the sequence.
Hyperedge Set: , each hyperedge connects positions .
Edge Weights (based on TF-IDF) (Salton & Buckley, 1988):
where:
Hypergraph Laplacian Matrix:
Algorithm pseudocode:

CytoGuard employs multi-head hypergraph attention mechanism:
Query, Key, Value Transformation:
Attention Score Calculation:
Output:
The hypergraph Laplacian matrix serves as a structural bias term, guiding the attention mechanism to focus on important connections in the hypergraph structure.
Algorithm pseudocode:

To adaptively select optimal k-mer combinations, we designed a dynamic k-mer selection mechanism:
Global Feature Extraction:
k-mer Weight Calculation:
where is the temperature parameter, , .
CytoGuard employs a combined loss function with three components:
Mean Squared Error Loss:
Mean Absolute Error Loss:
Ranking Loss:
Total Loss:
where , , .
The dataset is divided into training, test, and validation sets, all containing AMP and non-AMP sequences. AMP sequences are primarily sourced from PepVAE with their minimal inhibitory concentration (MIC) labels against E.coli, totaling 3,265 AMPs with annotated MIC values. Non-AMPs are 3,265 amino acid sequences without antimicrobial activity selected from Uniprot. Additionally, we collected nearly 10K deduplicated antimicrobial peptide sequences from APD3 (Wang et al., 2016), DRAMP, DBAASP (Pirtskhalava et al., 2021), etc., for fine-tuning pre-trained protein language models.
Blue line represents fine-tuned, green line represents non-fine-tuned
Training Process Comparison
Training convergence speed comparison shows fine-tuned models converge faster than non-fine-tuned:

Final performance on test set:
| Metric | Value | Interpretation |
|---|---|---|
| Spearman Correlation | 0.8543 | Predicted rankings highly consistent with true values |
| Pearson Correlation | 0.9105 | Strong linear correlation |
| RMSE | 0.1806 | Small root mean square error |
| MAE | 0.0786 | Very small mean absolute error |
| R² | 0.9053 | Explains 90.5% of variance |
Model performance visualization:

| Model Combination | Spearman | RMSE |
|---|---|---|
| ESM-2 only | 0.7892 | 0.2134 |
| Ankh only | 0.7456 | 0.2301 |
| ProtT5 only | 0.7123 | 0.2456 |
| ESM-2 + Ankh | 0.8234 | 0.1934 |
| All three | 0.8543 | 0.1806 |
| Strategy | Spearman | RMSE |
|---|---|---|
| Fixed k=3 | 0.8201 | 0.2012 |
| Fixed k=4 | 0.8156 | 0.2034 |
| Uniform weights | 0.8334 | 0.1887 |
| Dynamic selection | 0.8543 | 0.1806 |
Through attention weight visualization, we discovered:



Model output includes prediction uncertainty, providing confidence assessment for practical applications:
CytoGuard significantly improves antimicrobial peptide activity prediction accuracy through innovative hypergraph attention mechanisms and multi-model fusion strategies. The excellent performance achieved on the test set demonstrates the method's effectiveness, outperforming previous deep learning approaches for AMP prediction (Veltri et al., 2018; Chung et al., 2020). This work provides new technical solutions for computational biology and drug discovery fields, with important theoretical value and practical significance.
Future work will focus on further improving the model's generalization ability and computational efficiency, and exploring applications in broader protein function prediction tasks.
The CytoEvolve model is a deep reinforcement learning-based antimicrobial peptide sequence optimization framework for improving and optimizing antimicrobial peptide sequences to enhance their antimicrobial activity. The framework primarily includes: (1) an attention mechanism-based policy network (Mutator) integrated with Diffusion architecture for selecting amino acid mutation sites; (2) a fine-tuned Ankh protein language model for generating amino acid substitutions; (3) CytoGuard for evaluating antimicrobial activity. By optimizing policy network parameters through the REINFORCE algorithm, the system can iteratively improve peptide sequences to maximize predicted antimicrobial activity scores. The framework employs experience replay mechanisms and early stopping strategies, effectively balancing exploration and exploitation, achieving effective evolution from existing AMPs, signal peptides, or random sequences to highly active antimicrobial peptides.
Model Workflow

Wet lab experiments revealed that the original LL-37's antimicrobial duration is only about 8 hours, with slightly insufficient antimicrobial activity. Facing this dilemma, we hope to obtain LL-37 variants that can improve the deficiencies of the original sequence. However, traditional experimental methods often involve manual mutation induction with low success rates and high time costs (Das et al., 2021). While computational design has low costs, it also faces the following challenges:
To address existing dilemmas and challenges, our dry lab designed the CytoEvolve model to generate more stable LL-37 variants with higher antimicrobial activity.
Modeling Analysis
Let the antimicrobial peptide sequence be , where represents the amino acid at position i, and is the set of 20 natural amino acids. The optimization objective can be expressed as:
where is the antimicrobial activity evaluation function. Due to direct optimization difficulties, we transform it into a Markov Decision Process (MDP):
The policy network is designed based on attention mechanisms to learn optimal mutation site selection strategies:
where represents the probability distribution for each site being selected for mutation.
Network structure includes:
We employ a Discrete Diffusion Model (Austin et al., 2021; Sohl-Dickstein et al., 2015) for iterative sequence generation. The model gradually transforms the original sequence into pure noise (e.g., fully masked sequence) through a predefined forward process over T time steps.
The core is a trained denoising network that learns to reverse this process: given a noisy sequence at any time step , it predicts the most likely original sequence . The model's optimization objective is to minimize prediction loss:
where t is uniformly sampled from {1,...,T}, and is the sequence after t steps of noise addition from .
The generation process starts from a fully random or masked sequence and gradually recovers a structurally clear target sequence through T iterative applications of the denoising network .
Diffusion Steps: T=200 Noise Schedule: Cosine Schedule Max Length:
The CytoGuard predictor is based on hypergraph neural networks, constructing k-gram features as hypergraph structures:
where: - Node set : Ankh embedding representations of amino acid residues - Hyperedge set : k-gram subsequences (k=2,3,4)
Hypergraph convolution operation:
where: - : Incidence matrix - , : Node and hyperedge degree matrices - , : Learnable weight matrices
TF-IDF weights enhance important k-gram contributions:
REINFORCE Algorithm (Williams, 1992)
We optimize policy network parameters using policy gradient methods:
In implementation, gradient estimation is:
where: - : Log-likelihood of the i-th sample - : Corresponding reward value
Experience Replay Mechanism
To improve sample efficiency, we introduce experience replay buffer :
During each training step, the loss function combines current batch and historical experience:
Buffer management strategy: 1. Deduplication: Remove duplicate sequences 2. Sorting: Sort by score in descending order 3. Truncation: Keep top high-scoring samples
Main Reward Function
Reward function designed based on CytoGuard predicted logMIC values:
where is the predicted normalized antimicrobial activity.
Sequence Diversity Penalty
To avoid repeatedly generating identical sequences, we introduce history penalty mechanism:
Hyperparameter Configuration
Loss Function and Optimizer
Custom negative log-likelihood loss:
Optimizer uses Adam algorithm (Kingma & Ba, 2015):
Here is the pseudocode for Reinforcement Learning and Diffusion Model algorithms.


Experimental Results
Facing the challenges of high experimental validation costs and time-consuming biological experiments, dry and wet labs collaborated. The dry lab further validated and screened generated sequences, narrowing candidates to the four best-performing variants, while the wet lab used D2P methods for synthesis and validation, further reducing experimental costs.
Finally, the dry lab's LL-37 variants labeled as Variant-1 and Variant-2 showed stronger antimicrobial activity compared to the original LL-37 sequence. However, from experimental data, Variant-2's duration was lower than Variant-1 and the original LL-37 sequence, showing strong initial antimicrobial activity but reduced activity at 3 hours.


CytoEvolve constructed an end-to-end antimicrobial peptide (AMP) optimization framework, with its core being the first organic combination of Discrete Diffusion Models for sequence generation with Reinforcement Learning. The framework utilizes the diffusion model's powerful generative capabilities to explore vast sequence spaces, creating diverse candidate peptides; simultaneously, a hypergraph neural network-based activity predictor serves as a reward function, efficiently guiding and optimizing sequence generation direction through reinforcement learning strategies (Schulman et al., 2017). This method not only significantly improves computational efficiency but also discovers sequence-function relationships difficult to find through traditional methods (Müller et al., 2018), providing a new computational paradigm for rational design of functional macromolecules. Although the system currently has limitations in sequence length and multi-objective optimization, the framework has shown tremendous potential in LL-37 and other antimicrobial peptide optimization, providing new methods for subsequent innovative drug discovery, protein engineering, and synthetic biology fields.
CytoFlow demonstrates that the future of peptide engineering lies not in isolated computational tools, but in integrated systems that unify sequence design, activity prediction, and production optimization. Our framework has not only achieved significant improvements in LL-37 engineering but also established new standards for computational methods in synthetic biology.
Through the synergistic combination of reinforcement learning, hypergraph neural networks, and fermentation modeling, we created a system that learns, adapts, and improves with each experimental cycle. The enhanced variant activity we achieved is just the beginning—CytoFlow lays the foundation for a new era of rational peptide design.
As we open-source CytoFlow to the iGEM community and beyond, we envision a future where any team can rapidly engineer peptides for therapeutic, industrial, or research applications. The cell factory (Cytopia) is no longer a distant dream but an achievable reality, powered by the computational framework we have developed.
The CytoFlow framework represents the culmination of intensive computational and experimental work by the Jiangnan-China iGEM team. We thank our advisors, collaborators, and the broader iGEM community for their support in realizing this vision.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403-410. https://doi.org/10.1016/S0022-2836(05)80360-2
Austin, J., Johnson, D. D., Ho, J., Tarlow, D., & van den Berg, R. (2021). Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34, 17981-17993.
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015).
Chung, C. R., Kuo, T. R., Wu, L. C., Lee, T. Y., & Horng, J. T. (2020). Characterization and identification of antimicrobial peptides with different functional activities. Briefings in Bioinformatics, 21(3), 1098-1114. https://doi.org/10.1093/bib/bbz043
Das, P., Sercu, T., Wadhawan, K., Padhi, I., Gehrmann, S., Cipcigan, F., Chenthamarakshan, V., Strobelt, H., dos Santos, C., Chen, P. Y., Yang, Y. Y., Tan, J. P. K., Hedrick, J., Crain, J., & Mojsilovic, A. (2021). Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering, 5(6), 613-623. https://doi.org/10.1038/s41551-021-00689-x
Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., Bhowmik, D., & Rost, B. (2022). ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 7112-7127. https://doi.org/10.1109/TPAMI.2021.3095381
Feng, Y., Wang, Y., & Liu, H. (2021). HGNN: Hypergraph neural networks. ACM Transactions on Knowledge Discovery from Data, 15(6), 1-28. https://doi.org/10.1145/3447548
Fjell, C. D., Hiss, J. A., Hancock, R. E., & Schneider, G. (2012). Designing antimicrobial peptides: form follows function. Nature Reviews Drug Discovery, 11(1), 37-51. https://doi.org/10.1038/nrd3591
Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 115, 513-583. https://doi.org/10.1098/rstl.1825.0026
Hancock, R. E., & Sahl, H. G. (2006). Antimicrobial and host-defense peptides as new anti-infective therapeutic strategies. Nature Biotechnology, 24(12), 1551-1557. https://doi.org/10.1038/nbt1267
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015).
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130. https://doi.org/10.1126/science.ade2574
Mahlapuu, M., Håkansson, J., Ringstad, L., & Björn, C. (2016). Antimicrobial peptides: An emerging category of therapeutic agents. Frontiers in Cellular and Infection Microbiology, 6, 194. https://doi.org/10.3389/fcimb.2016.00194
Mehta, D., Anand, P., Kumar, V., Joshi, A., Mathur, D., Singh, S., Tuknait, A., Chaudhary, K., Gautam, S. K., Gautam, A., Varshney, G. C., & Raghava, G. P. S. (2014). ParaPep: A web resource for experimentally validated antiparasitic peptide sequences and their structures. Database, 2014, bau051. https://doi.org/10.1093/database/bau051
Monge, F. A., Jagla, J. H., Hartman, F. M., Hubert, J., Ropelewski, A. J., & Clemons, P. A. (2006). Response surface methodology as an approach to optimize medium composition for enhanced antimicrobial peptide production. Journal of Applied Microbiology, 101(5), 1062-1070.
Müller, A. T., Hiss, J. A., & Schneider, G. (2018). Recurrent neural network model for constructive peptide design. Journal of Chemical Information and Modeling, 58(2), 472-479. https://doi.org/10.1021/acs.jcim.7b00414
Pirtskhalava, M., Amstrong, A. A., Grigolava, M., Chubinidze, M., Alimbarashvili, E., Vishnepolsky, B., Gabrielian, A., Rosenthal, A., Hurt, D. E., & Tartakovsky, M. (2021). DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Research, 49(D1), D288-D297. https://doi.org/10.1093/nar/gkaa991
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press.
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523. https://doi.org/10.1016/0306-4573(88)90021-0
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. Proceedings of the 32nd International Conference on Machine Learning, 2256-2265.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008.
Verhulst, P. F. (1838). Notice sur la loi que la population suit dans son accroissement. Correspondance Mathématique et Physique, 10, 113-129.
Veltri, D., Kamath, U., & Shehu, A. (2018). Deep learning improves antimicrobial peptide recognition. Bioinformatics, 34(16), 2740-2747. https://doi.org/10.1093/bioinformatics/bty179
Wang, G., Li, X., & Wang, Z. (2016). APD3: The antimicrobial peptide database as a tool for research and education. Nucleic Acids Research, 44(D1), D1087-D1093. https://doi.org/10.1093/nar/gkv1278
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256. https://doi.org/10.1007/BF00992696
Xiao, X., Wang, P., Lin, W. Z., Jia, J. H., & Chou, K. C. (2013). iAMP-2L: A two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Analytical Biochemistry, 436(2), 168-177. https://doi.org/10.1016/j.ab.2013.01.019
Zaslaver, A., Bren, A., Ronen, M., Itzkovitz, S., Kikoin, I., Shavit, S., Liebermeister, W., Surette, M. G., & Alon, U. (2006). A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nature Methods, 3(8), 623-628. https://doi.org/10.1038/nmeth895