Our objective
Because our drug consists of a completely original formula, incorporating two novel fusion peptides, BC and FT, it is essential to rigorously assess the structure, stability, and functionality of these components. This evaluation will help ensure that no molecular obstructions compromise therapeutic efficacy and will maximize the performance of our eyedrop treatment. Accordingly, the modelling team will conduct both computational and mathematical analyses to predict and optimize the properties of our fusion peptides throughout the development process. Step by step, we hope to critically evaluate the drug, so that it will obtain the best performance and reach the safety standards on par with usable drugs in the current market.
Dry Lab Workflow
The dry lab workflow begins with structural analysis, which serves as the foundation by providing detailed three-dimensional models of the fusion peptides. This step is crucial because all subsequent predictions depend on accurate structural information to understand the spatial arrangement of amino acids, potential folding patterns, and accessible surfaces. High-quality structural models enable precise identification of critical features such as binding sites, secondary structure elements, and regions important for function or interaction.
Following structural analysis, stability predictions are performed to assess how likely the peptide is to maintain its folded conformation under physiological conditions. Stability prediction tools evaluate the energetic consequences of the peptide’s amino acid composition and structural integrity, including the impact of mutations or environmental changes. Understanding stability is essential before further functional assessment, as unstable peptides may rapidly denature or aggregate, rendering them ineffective as drugs.
Once stability is established, functionality predictions evaluate the biological activity of the peptides. This includes assessing binding affinities, interaction potential with targets, or propensity to trigger immune responses. Functionality relies inherently on proper folding and stable structure, so this step logically follows stability assessment. Accurate functionality prediction guides optimization of the peptide’s therapeutic potential.
Finally, pharmacokinetics predictions in our dry lab focus mainly on dosage and concentration calculations, alongside evaluating the drug’s P-value. These calculations are critical for determining the appropriate peptide amount to achieve therapeutic efficacy while minimizing toxicity. Accurate dosage predictions guide drug delivery strategies and ensure the peptide reaches effective concentrations in target tissues. The P-value assessment provides insight into the peptide’s potency and interaction likelihood. This pharmacokinetic step is informed by prior stability and functionality analyses, ensuring that peptides with favorable structure and biological activity are modeled for optimal dosing regimens.
Together, these steps form a coherent, logically progressive workflow where each step builds critically on the results of the preceding one, ensuring that only peptides with promising structure, stability, and function are advanced towards pharmacokinetic evaluation and potential therapeutic use.
Step 1: Structural analysis
The first step for our modelling is structural analysis. Inspecting the structure of our fusion peptides is essential, as the fusion of different peptide sequences can result in unique conformations and unanticipated interactions for which limited precedent exists. Through the structural analysis of FT (composed of frattide and tp1) and BC (composed of BDNF and CDR1), this ensures that these newly developed molecules adopt stable, functional shapes compatible with their intended biological activities. Understanding their three-dimensional structure helps to identify potential issues such as aggregation, misfolding, or dysfunctional membrane interactions, all of which could compromise the effectiveness or safety of the eyedrop treatment. Given the innovative nature of these fusion peptides and the absence of prior data, structural validation is critical for supporting their therapeutic potential and guiding subsequent optimization steps in drug development.
Hence, we have conducted several computational analyses, including the use of AlphaFold, a state-of-the-art tool for predicting the three-dimensional structures of proteins from their amino acid sequences. Additionally, AlphaKnot 2.0 will be employed to analyze the topology and identify knotted regions within our protein models. At last, to visualize the structures and validate specific features such as disulfide bond formation, we utilized PyMOL. Together, these tools will provide a comprehensive evaluation of our fusion peptides’ structural integrity and functional potential.
Developing the structure: Alphafold, ChimeraX and PyMOL
Alphafold
AlphaFold is an advanced artificial intelligence system developed by DeepMind that predicts the three-dimensional structure of proteins directly from their amino acid sequences with remarkable accuracy. By leveraging deep learning techniques and extensive protein structure databases, AlphaFold is able to model complex protein folds accurately, even in cases where no similar structures are known. This capability addresses a long-standing challenge in computational biology, enabling researchers to gain critical structural insights quickly and cost-effectively compared to traditional experimental methods.
For our project, due to the absence of experimentally determined structures for the target proteins BDNF, CDR1, and FRAT, we utilized AlphaFold to generate predicted 3D models for these proteins, as well as for the fusion peptides BC and FT, based on their modified amino acid sequences. Following the design process, we performed ProtParam analyses to assess the stability and crucial physicochemical properties of the predicted models.
Failed Results
Linker DEVD

ref

ref

ref

ref

ref

ref
The predicted position error for the BC protein combined with the DEVD linker indicates a lack of spatial consistency, suggesting that the binding position between BC and DS is uncertain. Additionally, the confidence score for the BC+DEVD structure is very low, reflecting poor reliability in the predicted conformation. Similarly, the instability index (II) for the FT and DEVD combination is 42.60, exceeding the acceptable range and indicating that this protein combination is unstable according to ProtParam. Due to these findings—poor structural confidence for BC+DEVD and instability of FT+DEVD—both predicted structures are deemed unreliable and have been excluded from further analysis.
Linker EAAAK

ref

ref

ref

ref

ref

ref
Linker DS
The predicted local distance difference test (pLDDT) scores indicate low overall confidence in the predicted structure, with particularly low values observed in the linker region, suggesting potential instability. In addition, the extended length of the linker raises concerns about the structural compactness and stability under physiological conditions, especially within the ocular environment. Complementing these structural concerns, the instability index (II) for the FT and EAAAK combination was found to be 40.20, exceeding the acceptable range. Furthermore, the estimated half-life in yeast and Escherichia coli is only 3 minutes, indicating poor stability and survival in actual eye tissues. Taken together, these factors render the predicted structure unreliable and unsuitable for further study, leading us to decide to rebuild the target protein structure.

ref

ref

ref

ref

ref

ref
The predicted aligned error (PAE) plot for the BC protein linked with DS displays a scattered pattern with interspersed white spaces, indicating poor positional confidence and a lack of organized spatial arrangement between BC and DS. This dispersed pattern reflects high uncertainty in the relative positioning of these components. Supporting this, the pLDDT confidence scores for the FT+DS structure are very low, confirming the unreliability of the predicted model. In addition, the instability index (II) for the FT and DS combination, as calculated by ProtParam, is 56.04—well above the acceptable range. The estimated half-life of this protein combination in yeast and Escherichia coli is only 3 minutes, suggesting it would have minimal stability and survival in actual ocular environments. Based on these findings, both the BC+DS and FT+DS predicted structures are considered unsuitable, leading us to pursue alternative protein combinations.
Successful Results

ref

ref
Synthesis of BC(left) and FT(right) is successful as demonstrated by the position error graph below.

ref

ref
Both BC and FT peptides display strong biochemical indicators of stability and suitability for application. BC, with a low instability index (38.80), moderate aliphatic index (50.00), and long estimated half-lives across biological systems, is classified as stable. Its extinction coefficients suggest well-maintained disulfide bonds supporting a robust fold. FT, with even more disulfide-linked cysteines, a lower instability index (34.75), higher aliphatic index (55.85), and similar half-life extensions, is also highly stable. Unlike poorly designed peptides that exhibit high instability indices, short half-lives, and unreliable predicted structures, BC and FT demonstrate ideal physicochemical profiles—balanced charge, extensive covalent stabilization, and resistance to degradation—providing a solid foundation for functional studies and therapeutic development.
ChimeraX
The models were then visualized and analyzed in ChimeraX, with confidence scores guiding the assessment of folding accuracy and functional viability.
(placeholder)
PyMOL
Disulfide bonds play a crucial role in stabilizing protein structures by forming covalent links between cysteine residues. These bonds help maintain the overall protein architecture, especially for secreted and membrane proteins exposed to harsh oxidative environments. By constraining the folding and reducing conformational flexibility, disulfide bonds enhance protein stability, facilitate correct folding, and protect against denaturation or degradation under physiological stress.
At last, disulfide bond analysis was performed using PyMOL to understand the stability of our fusion peptides. While BC contains no disulfide bonds, FT has one, consistent with its smaller size and shorter sequence. The presence of this single disulfide bond in FT may contribute to its structural resilience, whereas the absence in BC suggests a reliance on other stabilizing interactions. Recognizing these differences informs our approach to optimizing peptide stability and functionality, ensuring that each fusion peptide performs effectively in its respective biological context.
PyMOL Results

ref

ref
Other bonds in the structures of the peptides
Peptide folding and structural maintenance depend on a variety of non-covalent interactions that collectively shape and stabilize the three-dimensional structures essential for their biological function. Among these, electrostatic interactions, hydrophobic effects, hydrogen bonding, and van der Waals forces each contribute in unique ways.
Electrostatic interactions originate from the attraction between oppositely charged side chains, such as lysine and arginine (positively charged) and aspartate and glutamate (negatively charged). Salt bridges, which are ionic bonds formed between these residues, stabilize tertiary and quaternary structures by neutralizing repulsive forces between like charges and drawing distant peptide segments together. These ionic interactions not only provide structural stability but also influence folding pathways by imposing favorable conformations. Furthermore, the spatial arrangement of charged amino acids contributes to the overall peptide charge and solubility, preventing aggregation through charge repulsion. Hydrophobic effects drive folding based on the principle that nonpolar side chains tend to avoid water. Hydrophobic amino acids cluster within the peptide, creating a compact core shielded from the aqueous environment. This collapse lowers the system's free energy by reducing the ordered water molecules surrounding hydrophobic groups, and promotes van der Waals interactions, which stabilize densely packed side chains. This hydrophobic core formation is often the initial step in peptide folding, crucial for proper assembly of secondary and tertiary structures.
Hydrogen bonding plays a critical role in maintaining secondary structures like α-helices and β-sheets. These bonds form between backbone amide hydrogens and carbonyl oxygens, locking the peptide backbone into repeating structural motifs that stabilize the fold and reduce entropy loss during folding. Side-chain hydrogen bonds also add specificity and further reinforcement to the native conformation. Finally, van der Waals forces, though individually weak, cumulatively contribute to closely packed atomic arrangements within the peptide interior. These dispersive interactions facilitate tight packing of atoms, increasing molecular compactness and overall stability.
Applying these principles to BC, the fusion peptide features a balanced distribution of charged residues that allows efficient salt bridge formation, reducing electrostatic repulsion and stabilizing the fold. Hydrophobic amino acids such as leucine, isoleucine, valine, and phenylalanine cluster internally, driving core compaction and supporting secondary structural elements like α-helices and β-turns. These helices and turns are further locked in place by backbone hydrogen bonding, contributing to a well-organized and stable peptide conformation.
For FT, the higher density of charged residues facilitates an extensive network of salt bridges that enhances both folding specificity and solubility by balancing charges and reducing aggregation tendencies. Its hydrophobic regions composed of leucine, valine, phenylalanine, and glycine residues create dense cores essential for a compact three-dimensional architecture. Backbone hydrogen bonds reinforce these secondary structures within the peptide, maintaining its functional tertiary fold. Together, BC and FT peptides exemplify how the combined action of electrostatic interactions, hydrophobic effects, hydrogen bonding, and van der Waals forces work synergistically to ensure their structural integrity and biological functionality, complementing the stability provided by disulfide bonds.
AlphaKnot 2.0 (Protein Knotting)
AlphaKnot 2.0 is a computational tool that analyzes protein structures to identify and characterize knots formed by the folding of the polypeptide chain. These knots influence protein stability, folding pathways, and function. The tool applies advanced algorithms to detect such topological features with high precision, providing insights into protein folding mechanisms that complement conventional structural analyses.
In our project, AlphaKnot 2.0 was used to assess the predicted fusion peptide structures for any knotting. This evaluation helped reveal important topological characteristics that could impact peptide stability and performance. By understanding and addressing potential knot-related constraints, we optimized the peptide designs to ensure favorable conformations and improved functional reliability. Integrating AlphaKnot 2.0 enabled us to enhance the accuracy and robustness of our fusion peptides’ structural models.
The principle of computational analysis
AlphaKnot 2.0 works by analyzing the protein backbone using the coordinates of alpha carbon (Cα) atoms obtained from standard protein structure files (PDB or CIF). It first evaluates the entire structure for knots using a probabilistic approach based on the HOMFLY-PT polynomial, where thousands of random chain closures are tested to determine if a knot is present with high confidence. When a knot is detected, the algorithm further refines the analysis by identifying the minimal segment of the protein chain that forms the knot, called the knot core, and produces detailed knot maps to localize and characterize these topological features precisely.
This topological analysis is coupled with advanced visualization tools, such as a customized PDBe Mol* viewer, which highlights knotted subchains and simplifies structures to reveal knot locations clearly. Accompanying interactive knot maps offer residue-level information including knot cores, tails, and lengths, helping users understand complex protein entanglements. The server infrastructure uses Python and the Flask framework, with asynchronous task management on Linux clusters to efficiently handle large-scale computations and user submissions. This robust computational setup enables comprehensive knot detection and topological validation of protein models, providing critical insights that complement traditional structural analysis.
AlphaFold 2.0 Results

ref

ref
Topology type: UNKNOTTED
An unknotted protein structure is advantageous for fusion peptides like FT and BC, which engage different cellular receptors. Such unknotted conformations tend to fold more efficiently and consistently, minimizing the risk of misfolding or aggregation that could impair function. This structural simplicity allows the peptides to maintain the flexibility and independence necessary for effectively interacting with their respective receptors. Consequently, ensuring that these fusion peptides remain unknotted supports their stability, receptor specificity, and overall therapeutic efficacy, crucial for the success of the eyedrop treatment.
Step 2: Stability Predictions
Following structural analysis, further predicting the stability of our fusion peptides is essential to ensure they retain their intended structure and functionality in physiological environments. While structural analysis confirms the peptides adopt the correct conformations, stability prediction evaluates how well these conformations withstand factors such as temperature, pH, and enzymatic degradation. By linking these steps, we obtain a comprehensive understanding, from static shape to dynamic resilience, guiding us to the optimization of peptide candidates with desirable therapeutic properties.
To achieve a comprehensive stability evaluation, we employed multiple computational tools targeting different aspects of peptide behavior. CamSol was used to predict the intrinsic solubility of the peptides, identifying regions prone to low solubility and aggregation. Medusa assessed the flexibility of the protein structure, as excessive flexibility can compromise stability. DeepSTABp provided thermostability predictions and insights into how mutations might influence peptide stability. Lastly, ProtParam was utilized to calculate physicochemical properties such as hydropathicity, molecular weight, and isoelectric point, which contribute to the overall stability and behavior of the peptides. Combining these analyses offered a detailed understanding of the fusion peptides' stability, proving essential for their design refinement and therapeutic development.
CamSol (Intrinsic Solubility)
CamSol is a computational tool that predicts both the intrinsic solubility and aggregation propensity of proteins and peptides based on their amino acid sequences. By evaluating physicochemical properties such as hydrophobicity, charge, and secondary structure tendencies, CamSol assigns solubility scores to residues within the sequence. This enables identification of regions that may cause unwanted aggregation or be too soluble, as either extreme could negatively affect a peptide’s stability and therapeutic efficacy. Managing this balance is essential to optimize the drug-like properties of fusion peptides.
In our study, CamSol was employed to analyze the fusion peptides FT and BC, helping us pinpoint segments prone to aggregation or excessive solubility. This information guided modifications to reduce aggregation risks without compromising necessary solubility, ensuring the peptides maintain the desired balance for stability and function. By integrating CamSol’s predictions into our design process, we enhanced the developability and therapeutic potential of our fusion peptides.
The Mathematical Concept
CamSol uses three methods to calculate amino acid pKa values:
- Using tabulated pKa values (taken from http://compbio.clemson.edu/pkad (SI)),
- Using PROPKA,
- Using IPC.
CamSol predictions are based on the Zyggregator method [Tartaglia GG, Pawar AP, Campioni S, et al. Prediction of aggregation-prone regions in structured proteins. J Mol Biol 2008;380:425–36]
In CamSol, four properties:
- Charge
- Hydrophobicity
- α-helical propensity
- β-sheet propensity
are combined to assign a score to each amino acid. It is then smoothed to account for the effect of neighboring residues, and corrected for hydrophobic-hydrophilic patterns and gatekeeper effects. An overall solubility score is calculated from this profile.
The charges for the amide group at the N-terminus and the carboxylic acid at the C-terminus are calculated by using the Henderson–Hasselbalch equation:

(ref)
Therefore, CamSol relies on accurate pKa values (either from the updated table, or calculated with PROPKA or IPC), and employs partial charges when the pH is close to the pKa of a charged amino acid.
Using the ratio of charged to neutral species calculated with the above equation, logDpH, representing hydrophobicity by pH-dependent hydrophobicity values, combines the partition coefficient logP of neutral and ionized species.

(ref)
δ is the difference between pKa and pH (pKa—pH for basic residues and pH—pKa for acidic residues). We used the pH-dependent logDpH calculations by Zamora and colleagues [Zamora WJ, Campanera JM, Luque FJ. Development of a structure-based, pH-dependent Lipophilicity scale of amino acids from continuum solvation calculations. J Phys Chem Lett 2019;10:883–9.] for neutral and ionized LogP values for all standard amino acids.
CamSol uses PROPKA, an open-source available pKa predictor [Søndergaard CR, Olsson MHM, Rostkowski M, et al. Improved treatment of ligands and coupling effects in empirical calculation and rationalization of pKa values. J Chem Theory Comput 2011;7:2284–95] [Olsson MHM, Søndergaard CR, Rostkowski M, et al. PROPKA3: consistent treatment of internal and surface residues in empirical pka predictions. J Chem Theory Comput 2011;7:525–37], to calculate accurate pKa values.
Samples of CamSol prediction model (excerpt from figure 2 of website [Marc Oeller, Ryan Kang, Rosie Bell, Hannes Ausserwöger, Pietro Sormanni, Michele Vendruscolo, Sequence-based prediction of pH-dependent protein solubility using CamSol, Briefings in Bioinformatics, Volume 24, Issue 2, March 2023, bbad004])

(ref)
“CamSol predicts solubility values that are highly correlated with experimental solubility values. Plots on the left-hand side in each column visualize how experimental and predicted values change over a range of pH values. The left axis and blue line report the predicted CamSol solubility score, the right axis and green markers the measured midpoints of PEG-precipitation, all as a function of pH (x-axis). The vertical yellow line is the theoretical isoelectric point. Plots on the right-hand side show the correlation between the predicted and measured relative solubility values. CamSol calculations were carried out using pKa values calculated by PROPKA for (A) DesAbO (nanobody), (B) bovine serum albumin (BSA), (C) hen egg white lysozyme and (D) human serum albumin (HSA), whereas for (E) α-synuclein and (F) insulin pKa values were calculated with IPC (framed in blue box). R is the Pearson’s coefficient of correlation.”
Camsol results
pH Solubility

(ref)

Solubility of BC and FT in different pH
The differing solubility responses of BC and FT to pH changes have important practical implications for their formulation and use. FT’s stable, high solubility across a wide pH range suggests it will remain soluble and functional under diverse physiological and storage conditions, making it easier to handle and formulate. This stability reduces risks of precipitation or loss of activity, supporting consistent therapeutic performance.
In contrast, BC’s solubility varies more significantly with pH, exhibiting a low-solubility region near neutral to slightly alkaline conditions, which could increase the risk of aggregation or precipitation in this range. This sensitivity means that BC formulations may require careful pH optimization to maintain stability and efficacy. However, the recovery of solubility at higher alkaline pH offers an opportunity to adjust formulation conditions accordingly. Understanding these characteristics allows for tailored strategies to maximize the stability and effectiveness of each peptide in their respective therapeutic environments.
Intrinsic Solubility
The CamSol method yields a solubility profile (one score per residue in the protein sequence) where regions with scores below -1 are aggregation promoting, above 1 solubility promoting.

Intrinsic Solubility of BC (overall score of 0.562263)
Based on the CamSol solubility profile, the BC ligand protein exhibits a high predicted intrinsic solubility. The profile indicates a lack of significant aggregation-prone regions, with predominantly neutral to positive solubility scores throughout the sequence. This suggests that BC is unlikely to aggregate under physiological conditions, making it a stable candidate for therapeutic use.
Practically, this high solubility means that BC is well-suited for recombinant expression in systems such as E. coli. Its favorable solubility simplifies the purification process and allows it to remain in solution at higher concentrations, enhancing its developability and potential for effective drug formulation.

Intrinsic Solubility of FT (overall score of 0.923477)
From the results above, the FT protein demonstrates a very high predicted intrinsic solubility, with an overall solubility score of 0.923, clearly classifying it as a highly soluble protein. Its residue-level solubility profile supports this assessment, showing no regions prone to aggregation and several that enhance solubility. This favorable profile suggests that FT is unlikely to form insoluble inclusion bodies during recombinant expression in bacterial systems like E. coli.
From a practical standpoint, FT's high solubility greatly facilitates the purification process and increases its likelihood of remaining soluble at higher concentrations. This stability and solubility profile make FT an excellent candidate for efficient production and downstream drug formulation, enhancing its therapeutic potential.
Summary of the above results
In summary, the comprehensive evaluation of the fusion peptides BC and FT reveals important insights into their solubility and stability profiles, which are critical for their successful development as therapeutic agents. FT demonstrates consistently high solubility across a broad pH range, indicating robustness under various physiological and storage conditions, thereby facilitating easier handling and reliable performance. BC, while also intrinsically soluble, exhibits more pH-sensitive solubility, with reduced solubility around neutral to slightly alkaline pH, necessitating targeted formulation strategies to maintain stability in this range.
Both peptides show favorable intrinsic solubility profiles with minimal aggregation-prone regions, making them excellent candidates for recombinant expression in bacterial systems like E. coli and simplifying the purification process. These characteristics promote their stability at higher concentrations, which is essential for therapeutic efficacy and manufacturability. Understanding these differences allows for informed formulation adjustments—such as pH optimization and buffer selection—to maximize stability and solubility, ultimately enhancing the developability and clinical potential of each fusion peptide.
MEDUSA (Protein Flexibility)
Understanding protein flexibility is essential in drug development because it enables predictions of how a protein drug will interact with its target and perform biological functions within the body. Protein flexibility allows for conformational changes, enhancing the ability to recognize and bind various receptor sites, and contributing to higher binding affinity and specificity. Accounting for these dynamic movements is crucial, as they often underpin critical processes like signaling, activation, or catalysis. In the context of drug design, evaluating flexibility helps optimize interactions, anticipate structural changes during binding, and reduce unfavorable steric clashes, ultimately improving therapeutic efficacy and stability.
To assess the flexibility of the fusion peptides in our project, we utilized the computational tool MEDUSA. This platform quantitatively predicts the flexibility of protein structures by analyzing their dynamic properties and conformational behavior. By applying MEDUSA, we identified flexible regions and evaluated how these may influence peptide stability and function. These insights guided our rational optimization of the fusion sequences, ensuring that our protein drugs maintain both sufficient adaptability for target engagement and adequate structural stability for therapeutic use.
Principle of AI Analysis
The MEDUSA (Multiclass flexibility prediction from sequences of amino acids) server uses information of the multiple sequence alignment of the homologous sequences and physico-chemical properties of individual amino acids to attribute flexibility class for each residue using a deep convolutional neural network [7]. The flexibility of the proteins are graded on a scale from 0 to 4, with 0 being the most rigid and 4 being the most flexible. The flexibility prediction of each amino acid is also assigned a confidence score, which indicates the probability of the MEDUSA server correctly predicting the flexibility. The score has 3 categories: score < 0.4, 0.4 ≤ score ≤ 0.5, score > 0.5. As such, we can evaluate the reliability of the MEDUSA predictions based on the provided confidence scores.
The following is a summary of the process in MEDUSA:
- Extract evolutionary information: MEDUSA finds homologs of the query sequence by HHblits search.
- MEDUSA filters the resulting Multiple sequence alignment (MSA) file using HHfilter
- The final MSA is translated into a probability profile using position specific score matrix: each position of the sequence is thus encoded by 21 numerical values corresponding to 20 amino acid types and gaps.
- MEDUSA translates each amino acid to 58 numerical values, which encode its physico-chemical properties (using AA INDEX scheme).
- MEDUSA creates one hot encoding of each amino acid and adds a flag for the sequence terminus.
- Using a sliding window of 15 amino acids, MEDUSA creates input vectors for each sequence position for all the considered features.
- Different features are merged to create an input vector for the prediction of dimensions 15x100.
- The neural network performs binary and multi-class predictions and provides the general summary as well as flexibility prediction and confidence value for each amino acid.
The accuracy of MEDUSA predictions is dependent on the protein size. The mean accuracy is almost the same for the range of the considered sequence lengths, the deviation of the accuracy values increases for shorter proteins.

Solubility of BC and FT in different pH
MEDUSA Results

ref

ref
The observation that only 26% of BC and 23% of FT structures exhibit flexibility strongly suggests that these peptides predominantly maintain stable conformations with limited molecular motion. This reduced flexibility translates into higher structural integrity and makes these peptides less susceptible to unfolding or denaturation, which is a significant advantage for drug development. Stable proteins are better at retaining their active forms over time, ensuring reliable biological activity. They also demonstrate improved stability during formulation and storage, reducing the likelihood of aggregation and enhancing shelf-life—key considerations for therapeutic efficacy.
Furthermore, limited flexibility in crucial regions allows for more consistent and specific interactions with target receptors, supporting sustained and effective binding. This characteristic minimizes undesirable conformational changes that could compromise function or lead to off-target effects. Overall, the low flexibility of BC and FT supports their potential as robust drug candidates, allowing them to perform reliably under physiological conditions and contributing to efficient, long-lasting therapeutic outcomes.
DeepSTABp (Thermostability)
Determining the boiling point of an ocular drug is important to ensure its stability and safety under physiological conditions. Ideally, the drug should have a boiling point significantly higher than human body temperature, around 37 °C, to prevent it from vaporizing or degrading upon administration or during storage. If the boiling point is too low, the drug may evaporate or lose potency when exposed to the warm and moist environment of the eye, leading to inconsistent dosing and reduced therapeutic efficacy. Additionally, a low boiling point could complicate manufacturing and handling, as well as increase the risk of ocular irritation or damage due to changes in the drug’s physical state. Therefore, aiming for a boiling point above physiological temperature helps maintain the drug’s integrity, ensures consistent delivery, and promotes patient safety during ocular application.
Principle of AI Analysis
Similar to CSM-Toxin, DeepSTABp uses a transformer-based protein language model for sequence embedding and state-of-the-art feature extraction in combination with other deep learning techniques for end-to-end protein melting temperature prediction.

Solubility of BC and FT in different pH
DeepSTABp is based on four different artificial network blocks. The first three blocks create an embedding of the protein query based on the input features:
- Type of experimental condition used in the thermal proteome profiling experiment,
- The protein amino acid sequence
- The organism's growth temperature.
Block 1 and block 3 use small multilayer perceptrons (MLP). Block 3 consists of the pretrained transformer-based model ProtTrans-XL, followed by a mean pooling layer. The output vectors of the first blocks are joined and inputted into a final MLP block, which outputs the predicted protein.
Datasets used for model training and evaluation in this study were derived from high-throughput mass spectrometry-based thermo-proteome profiling (TPP) assays. To achieve an extensive and homogenous collection of experimentally determined protein melting temperatures (Tms), individual protein melting points are determined by fitting the following non-linear model:

(ref)
with a being the asymptote, m being the slope, and Tmid denoting the mid-temperature of the fitted curve. Tm was obtained by finding the temperature where the fitted function reaches a value of 0.5. In order to retrieve only reliable Tms, only model estimates with an R2score > 0.9 and temperature variance of less than 2॰c were retained in the final data set.
To validate the performances of models during training and testing and to allow for a fair comparison to alternative approaches, different commonly used evaluation metrics were computed. Each metric measures the discrepancy between vectors of N experimentally determined Tms (y) and predicted Tms (ŷ).
The mean average error (MAE):

(ref)
The mean squared error (MSE):

(ref)
The root mean squared error (RMSE):

(ref)
Sample Pearson correlation coefficient (PCC):

(ref)
And the coefficient of determination (R2):

(ref)
DeepSTABp Results
After providing the DeepSTABp server with the amino acid sequences of the peptides and proteins involved, we obtained the boiling points of the following:
Protein | Boiling point [°C] |
---|---|
BC | 49.31 |
FT | 47.91 |
The high boiling points of BC and FT ligands indicate their strong thermal stability, suggesting these proteins can maintain their structure and function even at elevated temperatures, which is highly beneficial for industrial and biotechnological applications. In the context of eye drops, the low volatility of these high-boiling-point ligands significantly enhances formulation stability and ocular safety. Because BC and FT have minimal evaporation during storage at room temperature or after opening, the active ingredient concentrations remain consistent, avoiding reductions in efficacy that can occur with volatile substances. This stability also prevents increased irritation that often results from concentrated preservatives or buffers as volatile components evaporate. At the eye’s temperature of approximately 34 °C, these ligands stay fully liquid, eliminating vapor formation or bubbles that could cause stinging sensations on sensitive ocular tissues like the conjunctiva or cornea. Their compatibility with the tear film ensures gentle, even distribution over the eye surface, improving comfort and drug delivery efficiency. Furthermore, the thermal stability and non-volatile nature of BC and FT simplify storage and usage, as refrigeration is unnecessary and air exposure post-opening does not degrade or diminish the formulation, reducing waste and infection risks. By maintaining ocular moisture balance, they also support tear film stability, which is especially important for long-term treatments, such as glaucoma management or dry eye relief, making BC and FT ideal ligands for ocular drug formulations.
ProtParam (Protein Half-life, Stability)
A thorough understanding of a protein’s physicochemical properties is fundamental in both basic research and drug development. Key characteristics such as molecular weight, isoelectric point (pI), amino acid composition, extinction coefficient, hydropathicity, and predicted half-life all provide crucial insights into protein stability, solubility, and biochemical behavior in different environments. These parameters influence protein expression, purification, storage, and formulation efficacy, impacting not only stability and manufacturability but also the therapeutic potential and safety profile of candidate drugs. Accurate knowledge of these properties allows researchers to tailor conditions for optimal protein function and helps predict how modifications or environmental factors may influence activity and stability.
ProtParam is an indispensable tool for efficiently obtaining this comprehensive set of physicochemical data directly from protein or peptide sequences. By calculating molecular weight, pI, atomic composition, instability and aliphatic indices, extinction coefficient, GRAVY, and estimated half-life, ProtParam enables rapid assessment of key protein properties early in the design and optimization process. In our workflow, application of ProtParam to the fusion peptides BC and FT supported rational sequence design, guided formulation strategies, and informed candidate selection by predicting parameters relevant to expression stability and solubility. This approach maximizes the likelihood of developing robust, effective, and manufacturable protein therapeutics.
The Mathematical Concept
ProtParam computes various physico-chemical properties that can be deduced from a protein sequence. No additional information is required about the protein under consideration. [Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607]
Extinction coefficients
The extinction coefficient indicates how much light a protein absorbs at a certain wavelength. It is useful to have an estimation of this coefficient for following a protein with a spectrophotometer when purifying it.
It is possible to estimate the molar extinction coefficient of a protein from knowledge of its amino acid composition [Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 182:319-326(1989).] From the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb appreciably at wavelengths >260 nm, while cystine does) at a given wavelength, the extinction coefficient of the native protein in water can be computed using the following equation:

(ref)
Ex = Extinction coefficient of each amino acid, Nx = number of amino acids
ETyr = 1490, ETrp = 5500, ECys = 125
The absorbance (optical density) can be calculated using the following formula:

(ref)
Etotal and A are produced by ProtParam based on the above equations, both for proteins measured in water at 280 nm. The first one shows the computed value based on the assumption that all cysteine residues appear as half cystines (i.e. all pairs of Cys residues form cystines), and the second one assuming that no cysteine appears as half cystine (i.e. assuming all Cys residues are reduced). Experience shows that the computation is quite reliable for proteins containing Trp residues, however there may be more than 10% error for proteins without Trp residues.
Note: Cystine is the amino acid formed when a pair of cysteine molecules are joined by a disulfide bond.
In vivo / vitro half-life
The half-life is a prediction of the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell. ProtParam relies on the "N-end rule" [Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186.], which relates the half-life of a protein to its N-terminal residue; the prediction is given for 3 model organisms (human, yeast and E.coli).
The "N-end rule" was established from experiments [Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712.] that explored the metabolic fate of artificial beta-galactosidase proteins with different N-terminal amino acids engineered by site-directed mutagenesis. The beta-gal proteins thus designed have strikingly different half-lives in vivo, from more than 100 hours to less than 2 minutes, depending on the nature of the amino acid at the amino terminus and on the experimental model.
Amino acid | Mammalian | Yeast | E. coli |
---|---|---|---|
Ala | 4.4 hour | >20 hour | >10 hour |
Arg | 1 hour | 2 min | 2 min |
Asn | 1.4 hour | 3 min | >10 hour |
Asp | 1.1 hour | 3 min | >10 hour |
Cys | 1.2 hour | >20 hour | >10 hour |
Gln | 0.8 hour | 10 min | >10 hour |
Glu | 1 hour | 30 min | >10 hour |
Gly | 30 hour | >20 hour | >10 hour |
His | 3.5 hour | 10 min | 2 min |
Ile | 20 hour | 30 min | >10 hour |
Leu | 5.5 hour | 3 min | 2 min |
Lys | 1.3 hour | 3 min | 2 min |
Met | 30 hour | >20 hour | >10 hour |
Phe | 1.1 hour | 3 min | 2 min |
Pro | >20 hour | >20 hour | ? |
Ser | 1.9 hour | >20 hour | >10 hour |
Thr | 7.2 hour | >20 hour | >10 hour |
Trp | 2.8 hour | 3 min | 2 min |
Tyr | 2.8 hour | 10 min | 2 min |
Val | 100 hour | >20 hour | >10 hour |
Table of the amino acids and the corresponding half-life [Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607]
Instability index
The instability index provides an estimate of the stability of your protein. Statistical analysis of 12 unstable and 32 stable proteins has revealed [16] that there are certain dipeptides, the occurrence of which is significantly different in the unstable proteins compared with those in the stable ones. The authors of this method have assigned a weight value of instability to each of the 400 different dipeptides (DIWV). Using these weight values it is possible to compute an instability index (II) which is defined as:

(ref)
where: L is the length of sequence and DIWV(x[i]x[i+1])is the instability weight value for the dipeptide starting in position i.
A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.
Aliphatic index
The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. The aliphatic index of a protein is calculated according to the following formula [Ikai, A.J. (1980) Thermostability and aliphatic index of globular proteins. J. Biochem. 88, 1895-1898. ]:

(ref)
where XAla ,XVal ,XIle , XLeu are mole percent (100 X mole fraction) of alanine, valine, isoleucine, and leucine. The coefficients are the relative volume of the valine side chain to the side chain of alanine.
ProtParam Results
- BC
- Molecular weight: 4482.15
- Total number of negatively charged residues (Asp + Glu): 1
- Total number of positively charged residues (Arg + Lys): 5
- Atomic composition:
- Formula: C201H299N53O56S4
- Total number of atoms: 613
- Extinction coefficients ((M^-1cm^-1), at 280 nm measured in water):
- Ext. coefficient: 17085, assuming all pairs of Cystine residues form cystines
- Ext. coefficient: 16960, assuming all Cystine residues are reduced
- Estimated half-life:
- The N-terminal of the sequence considered is S (Ser).
- The estimated half-life is:
- 1.9 hours (mammalian reticulocytes, in vitro).
- >20 hours (yeast, in vivo).
- >10 hours (Escherichia coli, in vivo).
- Instability index:
- The instability index (II) is computed to be 38.80
- This classifies the protein as stable.
- Aliphatic index: 50.00
Conclusion of results of BC
The BC peptide is a small molecule with a molecular weight of 4482.15 Da and a favorable charge distribution, possessing only one negatively charged residue and five positively charged residues. This charge profile potentially enhances its ability to penetrate ocular tissues effectively. Its atomic composition ( C201H299N53O56S4) and relatively small total number of atoms (613) contribute to its manageable size for drug formulation. Additionally, BC exhibits strong stability, as indicated by an instability index of 38.80 classifying it as a stable protein, and an estimated half-life ranging from 1.9 hours in mammalian reticulocytes to over 20 hours in yeast and over 10 hours in E. coli, suggesting durability under physiological conditions.
Given these properties, BC’s stability and solubility profiles make it a strong candidate for ocular drug formulation. The extinction coefficient values indicate reliable detection and quantification potential, while the moderate aliphatic index of 50.00 suggests reasonable thermal stability. Together, these parameters highlight BC's suitability for recombinant expression and therapeutic use, offering promising penetration capabilities, structural stability, and ease of formulation for effective ocular drug delivery.
- FT
- Molecular weight: 9001.35
- Total number of negatively charged residues (Asp + Glu): 6
- Total number of positively charged residues (Arg + Lys): 12
- Atomic composition:
- Formula: C376H597N123O111S12
- Total number of atoms: 1219
- Extinction coefficients ((M-1cm-1), at 280 nm measured in water):
- Ext. coefficient: 10720, assuming all pairs of Cystine residues form cystines
- Ext. coefficient: 9970, assuming all Cystine residues are reduced
- Estimated half-life:
- The N-terminal of the sequence considered is S (Ser).
- The estimated half-life is:
- 1.9 hours (mammalian reticulocytes, in vitro).
- >20 hours (yeast, in vivo).
- >10 hours (Escherichia coli, in vivo).
- Instability index:
- The instability index (II) is computed to be 34.75
- This classifies the protein as stable.
- Aliphatic index: 55.85
Conclusion of results of BC
The FT peptide has a moderate molecular weight of 9001.35 Da and exhibits a higher net positive charge, with six negatively charged and twelve positively charged residues. This charge distribution is advantageous as the increased positive charge can enhance membrane penetration by interacting with the negatively charged components of cellular membranes. With an atomic formula of C376H597N123O111S12 and a total of 1219 atoms, FT maintains a manageable size for drug delivery applications. The peptide demonstrates stability, as indicated by an instability index of 34.75, classifying it as stable, and an estimated half-life exceeding 1.9 hours in mammalian reticulocytes, with longer longevity in yeast and E. coli systems.
These properties collectively support FT’s candidacy as a viable ocular drug molecule. Its favorable charge profile suggests efficient cellular uptake, while predicted stability and solubility parameters promote robust performance during formulation and use. The extinction coefficients further assist in quantifying FT during manufacturing and quality control. Together, these characteristics underscore FT’s potential for effective membrane penetration, stability, and suitability for therapeutic development targeting ocular diseases.
Step 3: Functionality Predictions
Following stability predictions, evaluating the functionality of our fusion peptides is imperative to ensure they perform effectively within physiological environments. While stability assessments confirm that the peptides maintain their conformations under various conditions, functionality prediction examines their ability to interact efficiently and specifically with target molecules, such as receptors or other proteins. This combined approach transitions from ensuring structural robustness to confirming biological activity, guiding the rational optimization of peptide candidates for enhanced therapeutic potential. Given the complexity of fusing distinct peptide sequences in FT and BC, functionality prediction also helps reveal potential issues like impaired binding or off-target interactions that might not be evident from stability data alone, allowing for early design refinements.
To conduct a comprehensive functionality assessment, we employed a suite of complementary computational tools. DeepKa was used to predict binding affinities and kinetics, providing insights into the potential efficacy of the peptides. CSM-Toxin evaluated the toxicity profile to ensure safety and minimize adverse effects. PMIpred identified potential protein-protein interaction sites, essential for validating target engagement capabilities. Lastly, HDOCK facilitated molecular docking simulations, modeling how the fusion peptides physically interact with their targets. The integration of these predictions furnished a detailed map of functional attributes, which was critical for refining peptide design and increasing the likelihood of successful therapeutic application.
DeepKa
Predicting protein pKa values is essential because these values influence protein structure, function, and interactions within physiological environments. The pKa of ionizable groups affects the protonation state of amino acids, which in turn determines protein charge distribution, stability, binding affinity, and enzymatic activity. Accurate pKa prediction helps elucidate how proteins respond to changes in pH, informing drug design by identifying critical residues involved in binding and catalysis. This understanding is crucial for optimizing therapeutic peptides and proteins, ensuring they maintain desired bioactivity and stability under physiological conditions.
DeepKa is a deep learning-based tool developed to predict protein pKa values accurately by leveraging extensive data from continuous constant-pH molecular dynamics simulations. Unlike traditional empirical methods, DeepKa uses a sophisticated grid charge representation of protein electrostatics and a deep neural network to achieve prediction accuracy comparable to computationally intensive molecular dynamics simulations but with greater speed and efficiency. The tool’s ability to predict pKa values supports related applications, including protein–ligand binding affinity prediction, making it a valuable asset in computational protein engineering and drug discovery. In our project, DeepKa enabled us to identify ionizable residues critical for peptide function and interaction, guiding design improvements that enhance biological performance and therapeutic potential.
Principle of AI Analysis
DeepKa utilizes an advanced deep learning algorithm for predicting protein pKa values, leveraging data obtained from continuous constant-pH molecular dynamics (CpHMD) simulations. Unlike traditional empirical or physics-based approaches, DeepKa employs a grid-based charge representation to model the protein’s electrostatic environment surrounding ionizable residues (Asp, Glu, His, Lys). This grid-based strategy smooths charge distributions and electrostatic energies, resolving discontinuities introduced by cutoff methods prevalent in earlier techniques. By modeling the protein within a defined cubic box or sphere, DeepKa reduces computational demand while maintaining accuracy through focused analysis of electrostatics within this spatial region.
Mathematically, DeepKa correlates the local electrostatic potential ϕ at each residue site with shifts in pKa values ΔpKa, relative to intrinsic reference pKa values. These shifts predominantly arise from desolvation effects and electrostatic interactions within the protein microenvironment. The model employs a trained deep neural network f(ϕ,X;θ) to map these electrostatic features and structural information X to pKa perturbations:

(ref)
Here, θ denotes the network parameters optimized by minimizing the root mean square error (RMSE) loss between predicted pKa values and reference CpHMD-derived pKa values, expressed by:

(ref)
This loss function directs model training across large datasets, enabling DeepKa to achieve accuracy comparable to CpHMD simulations while significantly enhancing computational efficiency. The grid-based electrostatic approach further improves robustness by smoothing charge distributions and removing cutoff artifacts. Together, these methodological advancements empower DeepKa to accurately and rapidly predict protein pKa values for high-throughput applications, including protein–ligand binding affinity assessment and protein engineering.
DeepKa Results

ref

ref
At the physiological pH of approximately 7.4 in the ocular environment, the charge properties of fusion peptides BC and FT are critical determinants of their therapeutic efficacy and safety. Both peptides carry a net positive charge at this pH, as their isoelectric points (pI) are well above 7.4—9.8 for BC and 10.48 for FT. This positive charge enhances electrostatic attraction to the predominantly negatively charged ocular surface and cellular membranes, which contain components such as glycosaminoglycans and phospholipids. These interactions facilitate strong, specific binding to cellular receptors, which is essential for efficient drug targeting and uptake.
The BC peptide’s moderate positive charge at pH 7.4 offers a balanced interaction profile: it is sufficient to promote effective receptor binding while minimizing undesired nonspecific interactions, such as chelation with metal ions present in tear fluid. Such controlled binding reduces peptide sequestration and preserves bioavailability. Moreover, the moderate charge supports peptide solubility and stability, reducing risks of aggregation or precipitation that could impair delivery and therapeutic function. In contrast, FT’s higher positive charge may increase binding efficacy but also raises the potential for nonspecific interactions. Therefore, its formulation must carefully balance these properties to maintain safety and stability. These charge-related insights guide the rational design and optimization of BC and FT as ocular drugs, ensuring they achieve targeted delivery, sustained activity, and minimal side effects within the complex biochemical milieu of the eye.
PMIpred
Predicting protein–membrane interactions is critical for designing fusion peptides and therapeutic proteins intended for cellular delivery. Membrane binding is often the first step before processes such as endocytosis, which determines how a peptide or protein enters the cell and exerts its biological effect. Accurately estimating binding strength and interaction sites enables rational improvements in drug design, enhancing uptake, bioavailability, and specificity. For engineered fusion constructs, experimental data on membrane association are often limited, making computational prediction an essential tool for evaluating and optimizing cellular delivery.
PMIpred is a physics-informed prediction method that quantifies protein–membrane interactions by estimating membrane-binding free energies from sequence or structure. Using a transformer neural network trained on over 50,000 peptides, it predicts both global binding affinity and residue-level contributions, distinguishing between nonspecific membrane association and curvature sensing. Results are mapped onto 3D structural models, allowing visualization of interaction regions and guiding mutational design. By combining reliable quantitative predictions with broad applicability across diverse protein types and membrane environments, PMIpred provides a powerful resource for optimizing cellular entry strategies and advancing research on membrane-associated biological processes.
Principle of AI Analysis
PMIpred evaluates protein–membrane interactions by combining machine learning predictions of thermodynamic favorability with structural accessibility calculations. The central principle is that successful endocytosis is triggered when regions of a protein bind strongly and specifically to lipid bilayers. The tool therefore needs to predict where these regions are and how energetically favorable their interactions will be.
The process begins with a sliding window approach. The protein is partitioned into overlapping short segments, allowing local sequence and structural features to be assessed without losing fine detail. For each segment, a neural network predicts the curvature-sensing free energy change:

(ref)
where negative values indicate that segment w binds favorably to the membrane. Since residues appear in multiple windows, an average ΔΔF is assigned back to each residue:

(ref)
where W(i) is the set of overlapping windows containing residue i, and Ni is their count. This yields a residue-level energy map.
In parallel, PMIpred calculates the solvent-accessible surface area (SASA) of each residue using a probe-based geometric algorithm. A residue is considered sufficiently exposed—and thus available to interact with lipids—if its surface area satisfies:

(ref)
where θ is an accessibility threshold. Only residues passing this SASA filter can contribute meaningfully to membrane binding. The method then integrates accessibility with predicted free energy. Accessible residues (A) with favorable ΔΔF values are classified according to energy cutoffs:

(ref)
This classification ensures that strongly favorable regions are marked as binders (B), intermediate ones as curvature sensors (S), and weak or unfavorable sites as non-binders (-).
Membrane composition further refines the prediction. If the bilayer is negatively charged, ΔΔF values are corrected by including electrostatic effects:

(ref)
Here, ΔGelec,i adjusts for attraction or repulsion between charged residues and lipid headgroups, while for neutral membranes the unadjusted ΔΔF values (ΔΔFi,L24) are used.
Finally, PMIpred produces three types of outputs:
- Global membrane-binding free energy:

(ref)
where Nacc is the number of accessible residues. This value reflects the protein’s overall membrane-binding tendency.
- Residue-level classifications: Each residue is tagged as binder (B), sensor (S), or non-binder (-) along with its ΔΔF contribution.
- 3D structural mapping: The classification and energy scores are projected back onto the protein’s structural model, highlighting binding hot spots on the protein surface.

ref

ref

ref
Through this workflow, PMIpred transforms raw sequence and structural properties into a quantitative and spatially resolved fingerprint of protein–membrane interaction. These outputs provide insight into how readily a drug candidate can initiate endocytosis and guide design modifications to enhance uptake and specificity.
PMIpred results
1. FT

(ref)

(ref)

(ref)
Summary of FT
The analysis shows that FT possesses strong potential for endocytic uptake, driven by distinct regions of stability across its sequence. The central blocks (residues ~12–24 and 53–66) form stable clusters dominated by S and B classifications, creating a well-folded structural core. The C-terminal stretch (residues 92–110) further reinforces this stability, with consecutive strong B residues and ΔΔF_adj values below –10, marking it as a major binding hotspot. These stable zones are characterized by low SASA values, consistent with a compact, protected core that confers resilience under endosomal stress. In contrast, the N-terminus and the long internal span between residues 25–50 exhibit weaker classifications, reflecting greater flexibility. This balance of rigid, stable cores and strategically placed flexible segments suggests that FT can maintain structural integrity while retaining the adaptability needed for productive membrane engagement and intracellular function.
2. BC

(ref)

(ref)

(ref)
Summary of BC
The stability analysis of BC reveals a strong overall tendency for endocytic absorption and target engagement, with the N-terminal region showing a dense stretch of S and B classifications that indicate a robust, well-folded state. A central segment of the sequence, enriched in Trp and Phe residues, displays consistently favorable ΔΔF_adj values (< –10) and is dominated by B classifications, marking it as a key membrane-binding hotspot. These regions are flanked by S residues, creating extended patches of curvature-sensitive and binding-prone sites. Toward the C-terminal, the classifications gradually weaken, with fewer strong binders and an eventual decline into non-binders, reflecting a stabilization–interaction gradient across the molecule. Together, this profile suggests that BC combines high structural integrity with strategically positioned binding hotspots, supporting efficient endocytosis, strong receptor engagement, and enhanced therapeutic potential.
AllerCatPro 2.0
In developing therapeutic proteins and peptides, it is crucial to assess their potential to trigger allergic reactions or adverse immune responses. Such immunogenicity can compromise drug safety, reduce efficacy, and lead to undesired side effects that hinder clinical success. Early identification of allergenic regions within a drug candidate is therefore essential to avoid costly setbacks during development and to ensure patient safety. This functional aspect—predicting how a protein may be recognized by the immune system—is a key determinant of a drug’s viability and successful translation from design to therapy. Addressing allergenicity helps maintain the therapeutic’s functional integrity by preventing immune-mediated neutralization or hypersensitivity reactions.
AllerCatPro 2.0 is a cutting-edge computational tool specifically designed for predicting protein allergenicity as a critical component of functional assessment. Unlike tools focused on structural stability or folding efficiency, AllerCatPro 2.0 evaluates whether protein sequences and their three-dimensional conformations resemble known allergens, thereby estimating the likelihood of eliciting an immune response. It achieves this by integrating sequence motif analysis, structural similarity comparisons, and machine learning models that recognize subtle patterns associated with allergenic potential. By providing a detailed allergenicity profile and confidence scores, AllerCatPro 2.0 enables the identification and rational redesign of potentially problematic regions, ensuring that therapeutic candidates retain their intended biological function without compromising safety. This makes it an indispensable functionality prediction tool in drug development pipelines.
Principle of AI Analysis
AllerCatPro 2.0 predicts protein allergenicity by comparing the query sequence against a comprehensive dataset of known allergens using multiple sequential steps. First, it checks for gluten-like glutamine repeats, which serve as an independent allergen indicator but only lead to a strong allergenicity prediction if other similarities are present. Next, it performs a BLASTP search against a curated 3D structure database of 714 known allergens. If there is significant sequence similarity (E-value < 0.001), it evaluates the 3D surface epitope similarity to assign strong evidence if identity exceeds 92–93%, or weak evidence otherwise. If no structural matches are found, the tool uses a linear-window rule requiring 35% identity over 80 amino acids and, failing that, a hexamer hit approach requiring three hexamer matches to known allergens. If none of these tests succeed, the protein is predicted as having no evidence for allergenicity.

(ref)
Decision workflow of AllerCatPro 2.0 from the query protein to the results of either strong, weak or no evidence for allergenic potential. AllerCatPro 2.0 checks the similarity of the query protein with 714 representatives in our 3D model/structure database of known allergens as well as the most comprehensive dataset of reliable proteins associated with allergenicity (4979 protein allergens). In addition to only comparing the similarity of the query protein with the dataset of known allergens in AllerCatPro 1.7, AllerCatPro 2.0 now predicts the similarity of the query sequence to datasets of 165 autoimmune allergens and 162 low allergenic proteins separately. If a significant sequence similarity is found, then AllerCatPro 2.0 identifies hits of similar proteins associated with autoimmune diseases and/or similar proteins of low allergenic potential and presents the sequence identity to the closest hit.
In addition to known allergens, AllerCatPro 2.0 separately checks for similarity to autoimmune allergens and low allergenic proteins, offering a nuanced assessment of functional immune risk. It assigns predictions of strong, weak, or no evidence for allergenicity accompanied by detailed similarity scores and comments clarifying the basis of the prediction. Compared to previous methods, AllerCatPro 2.0’s integration of 3D structural similarity significantly improves prediction accuracy and reduces false positives. This hierarchical workflow prioritizes the most biologically relevant information to deliver reliable allergenicity assessments vital for ensuring the safety and functional viability of therapeutic proteins.
AllerCatPro 2.0 Results

Result of FT

Result of BC
Both FT and BC show no evidence of allergenicity according to AllerCatPro 2.0 predictions. This result is highly advantageous for their therapeutic use, as it indicates a low likelihood of triggering adverse immune or allergic reactions in patients. The absence of allergenic motifs or structural resemblance to known allergens supports a safer clinical profile, reducing risks related to immunogenicity that can compromise drug efficacy and patient safety.
Having no allergenic potential also facilitates smoother regulatory approval and broader applicability across diverse patient populations. It ensures that the designed protein drugs maintain their intended biological functions without unwanted immune system activation, thereby improving their overall functional viability and therapeutic success.
HDock (Docking and Confidence Score)
HDOCK is an advanced web server that facilitates molecular docking for protein-protein and protein-DNA/RNA interactions using a hybrid algorithm combining template-based modeling and free docking. It accepts both protein sequences and structures as input, making it accessible even when experimental structural data are unavailable. HDOCK efficiently integrates sequence similarity search, template selection, and docking simulations, providing rapid and accurate predictions of binding modes within about 10 to 20 minutes. The server leverages binding information from homologous complexes to improve the accuracy of docking results and supports flexible constraints such as user-provided binding site residues. Its versatility and computational efficiency have been validated on multiple benchmark datasets, demonstrating superior performance in predicting biologically relevant interactions compared to traditional methods.
The mathematical model
The most common method of checking binding confidence is by using AI prediction systems which find the best combination of protein-protein docking and calculates a confidence score using the following formula.

Result of BC
Edocking is the docking score of protein-protein complexes in PDB, which is usually around -200 or better. Roughly, when the confidence score is above 0.7, the two molecules are likely to bind; when the score is between 0.5 and 0.7, the molecules are likely to bind; when the confidence score is below 0.5, the molecules are unlikely to bind completely. Nevertheless, the confidence score should be used cautiously due to its empirical nature.
References for HDOCK (as per the server website hdock.phys.hust.edu.cn):
- Yan Y, Tao H, He J, Huang S-Y.* The HDOCK server for integrated protein-protein docking. Nature Protocols, 2020;
- Yan Y, Zhang D, Zhou P, Li B, Huang S-Y. HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy. Nucleic Acids Res. 2017;45(W1):W365-W373.
- Yan Y, Wen Z, Wang X, Huang S-Y. Addressing recent docking challenges: A hybrid strategy to integrate template-based and free protein-protein docking. Proteins 2017;85:497-512.
- Huang S-Y, Zou X. A knowledge-based scoring function for protein-RNA interactions derived from a statistical mechanics-based iterative method. Nucleic Acids Res. 2014;42:e55.
- Huang S-Y, Zou X. An iterative knowledge-based scoring function for protein-protein recognition. Proteins 2008;72:557-579.
Docking results

ref

ref

ref

ref

ref

ref

ref

ref

ref

ref

ref

ref

ref
The docking results showed docking scores lower than -200, indicating a strong interaction between the molecules. Additionally, the confidence score was approximately 0.7, reflecting a high reliability of the predicted binding mode. The combination of a low docking score and a high confidence score suggests that the binding affinity between the molecules is favorable, implying that the ligand-protein complex formed is stable and biologically relevant. This strong predicted affinity is critical for drug development, as it correlates with effective target engagement and potential therapeutic efficacy.
References
- Bechtel, T. J., & Weerapana, E. (2017). From structure to redox: The diverse functional roles of disulfides and implications in disease. Proteomics, 17(6), 1600391. https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/pmic.201600391
- Tartaglia GG, Pawar AP, Campioni S, et al. Prediction of aggregation-prone regions in structured proteins. J Mol Biol 2008;380:425–36
- Zamora WJ, Campanera JM, Luque FJ. Development of a structure-based, pH-dependent Lipophilicity scale of amino acids from continuum solvation calculations. J Phys Chem Lett 2019;10:883–9.
- Søndergaard CR, Olsson MHM, Rostkowski M, et al. Improved treatment of ligands and coupling effects in empirical calculation and rationalization of p Kavalues. J Chem Theory Comput 2011;7:2284–95.
- Olsson MHM, Søndergaard CR, Rostkowski M, et al. PROPKA3: consistent treatment of internal and surface residues in empirical pka predictions. J Chem Theory Comput 2011;7:525–37.
- Marc Oeller, Ryan Kang, Rosie Bell, Hannes Ausserwöger, Pietro Sormanni, Michele Vendruscolo, Sequence-based prediction of pH-dependent protein solubility using CamSol, Briefings in Bioinformatics, Volume 24, Issue 2, March 2023, bbad004
- Vander Meersche, Y., Cretin, G., de Brevern, A. G., Gelly, J. C., & Galochkina, T. (2021). MEDUSA: Prediction of protein flexibility from sequence. Journal of Molecular Biology, 166882. https://doi.org/10.1016/j.jmb.2021.166882
- Morozov, V.; Rodrigues, C.H.M.; Ascher, D.B. CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics 2023, 15, 431. https://doi.org/10.3390/pharmaceutics15020431
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110.
- Morozov, V., Rodrigues, C. H. M., & Ascher, D. B. (2023). CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics, 15(2), 431. https://doi.org/10.3390/pharmaceutics15020431
- Osorio, D.; Rondón-Villarreal, P.; Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015, 7, 4–14.
- Jung, F., Frey, K., Zimmer, D., & Mühlhaus, T. (2023). DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences, 24(8), 7444. https://doi.org/10.3390/ijms24087444
- Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607
- Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186.
- Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712.
- Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 4,155-161.
- Van Hilten, N.; Verwei, N.; Methorst, J; Nase, C.; Bernatavicius, A.; Risselada, H.J., Bioinformatics, 2024, 40(2).