| HK-HCY-PCMS - iGEM 2025

Our objective

Because our drug consists of a completely original formula, incorporating two novel fusion peptides, BC and FT, it is essential to rigorously assess the structure, stability, and functionality of these components. This evaluation will help ensure that no molecular obstructions compromise therapeutic efficacy and will maximize the performance of our eyedrop treatment. Accordingly, the modelling team will conduct both computational and mathematical analyses to predict and optimize the properties of our fusion peptides throughout the development process. Step by step, we hope to critically evaluate the drug, so that it will obtain the best performance and reach the safety standards on par with usable drugs in the current market.

Step 1: Designing the Workflow

The dry lab workflow begins with structural analysis, which serves as the foundation by providing detailed three-dimensional models of the fusion peptides. This step is crucial because all subsequent predictions depend on accurate structural information to understand the spatial arrangement of amino acids, potential folding patterns, and accessible surfaces. High-quality structural models enable precise identification of critical features such as binding sites, secondary structure elements, and regions important for function or interaction.

Following structural analysis, stability predictions are performed to assess how likely the peptide is to maintain its folded conformation under physiological conditions. Stability prediction tools evaluate the energetic consequences of the peptide’s amino acid composition and structural integrity, including the impact of mutations or environmental changes. Understanding stability is essential before further functional assessment, as unstable peptides may rapidly denature or aggregate, rendering them ineffective as drugs.

Once stability is established, functionality predictions evaluate the biological activity of the peptides. This includes assessing binding affinities, interaction potential with targets, or propensity to trigger immune responses. Functionality relies inherently on proper folding and stable structure, so this step logically follows stability assessment. Accurate functionality prediction guides optimization of the peptide’s therapeutic potential.

Finally, pharmacokinetics predictions in our dry lab focus mainly on dosage and concentration calculations, alongside evaluating the drug’s P-value. These calculations are critical for determining the appropriate peptide amount to achieve therapeutic efficacy while minimizing toxicity. Accurate dosage predictions guide drug delivery strategies and ensure the peptide reaches effective concentrations in target tissues. The P-value assessment provides insight into the peptide’s potency and interaction likelihood. This pharmacokinetic step is informed by prior stability and functionality analyses, ensuring that peptides with favorable structure and biological activity are modeled for optimal dosing regimens.

Figure of flowchart of dry lab’s workflow

Together, these steps form a coherent, logically progressive workflow where each step builds critically on the results of the preceding one, ensuring that only peptides with promising structure, stability, and function are advanced towards pharmacokinetic evaluation and potential therapeutic use.

Step 2: Structural analysis

The first step for our modelling is structural analysis. Inspecting the structure of our fusion peptides is essential, as the fusion of different peptide sequences can result in unique conformations and unanticipated interactions for which limited precedent exists. Through the structural analysis of FT (composed of frattide and tp1) and BC (composed of BDNF and CDR1), this ensures that these newly developed molecules adopt stable, functional shapes compatible with their intended biological activities. Understanding their three-dimensional structure helps to identify potential issues such as aggregation, misfolding, or dysfunctional membrane interactions, all of which could compromise the effectiveness or safety of the eyedrop treatment. Given the innovative nature of these fusion peptides and the absence of prior data, structural validation is critical for supporting their therapeutic potential and guiding subsequent optimization steps in drug development.

Hence, we have conducted several computational analyses, including the use of AlphaFold, a state-of-the-art tool for predicting the three-dimensional structures of proteins from their amino acid sequences. Additionally, AlphaKnot 2.0 will be employed to analyze the topology and identify knotted regions within our protein models. At last, to visualize the structures and validate specific features such as disulfide bond formation, we utilized PyMOL. Together, these tools will provide a comprehensive evaluation of our fusion peptides’ structural integrity and functional potential.

Developing the structure: Alphafold and PyMOL

Alphafold

AlphaFold is an advanced artificial intelligence system developed by DeepMind that predicts the three-dimensional structure of proteins directly from their amino acid sequences with remarkable accuracy. By leveraging deep learning techniques and extensive protein structure databases, AlphaFold is able to model complex protein folds accurately, even in cases where no similar structures are known. This capability addresses a long-standing challenge in computational biology, enabling researchers to gain critical structural insights quickly and cost-effectively compared to traditional experimental methods.

For our project, due to the absence of experimentally determined structures for the target proteins BDNF, CDR1, and FRAT, we utilized AlphaFold to generate predicted 3D models for these proteins, as well as for the fusion peptides BC and FT, based on their modified amino acid sequences. Following the design process, we performed ProtParam analyses to assess the stability and crucial physicochemical properties of the predicted models.

Failed Results

Failed prototype 1

(Alphafold and Protparam test of combination of BC and linker DEVD)

(Alphafold and Protparam test of combination of FT and linker DEVD)

The predicted position error for the BC protein combined with the DEVD linker indicates a lack of spatial consistency, suggesting that the binding position between BC and DS is uncertain. Additionally, the confidence score for the BC+DEVD structure is very low, reflecting poor reliability in the predicted conformation. Similarly, the instability index (II) for the FT and DEVD combination is 42.60, exceeding the acceptable range and indicating that this protein combination is unstable according to ProtParam. Due to these findings—poor structural confidence for BC+DEVD and instability of FT+DEVD—both predicted structures are deemed unreliable and have been excluded from further analysis.

Failed prototype 2

(Alphafold and Protparam test of combination of BC and linker EAAAK)

(Alphafold and Protparam test of combination of FT and linker EAAAK)

The predicted local distance difference test (pLDDT) scores indicate low overall confidence in the predicted structure, with particularly low values observed in the linker region, suggesting potential instability. In addition, the extended length of the linker raises concerns about the structural compactness and stability under physiological conditions, especially within the ocular environment. Complementing these structural concerns, the instability index (II) for the FT and EAAAK combination was found to be 40.20, exceeding the acceptable range. Furthermore, the estimated half-life in yeast and Escherichia coli is only 3 minutes, indicating poor stability and survival in actual eye tissues. Taken together, these factors render the predicted structure unreliable and unsuitable for further study, leading us to decide to rebuild the target protein structure.

Failed prototype 3

(Alphafold and Protparam test of combination of BC and linker DS)

(Protparam test of combination of FT and linker DS)

The predicted aligned error (PAE) plot for the BC protein linked with DS displays a scattered pattern with interspersed white spaces, indicating poor positional confidence and a lack of organized spatial arrangement between BC and DS. This dispersed pattern reflects high uncertainty in the relative positioning of these components. Supporting this, the pLDDT confidence scores for the FT+DS structure are very low, confirming the unreliability of the predicted model. In addition, the instability index (II) for the FT and DS combination, as calculated by ProtParam, is 56.04—well above the acceptable range. The estimated half-life of this protein combination in yeast and Escherichia coli is only 3 minutes, suggesting it would have minimal stability and survival in actual ocular environments. Based on these findings, both the BC+DS and FT+DS predicted structures are considered unsuitable, leading us to pursue alternative protein combinations.

Successful Results

BC 3D model

FT 3D model

Synthesis of BC(left) and FT(right) is successful as demonstrated by the position error graph below.

position error graph of BC

position error graph of FT

Both BC and FT peptides display strong biochemical indicators of stability and suitability for application. BC, with a low instability index (38.80), moderate aliphatic index (50.00), and long estimated half-lives across biological systems, is classified as stable. Its extinction coefficients suggest well-maintained disulfide bonds supporting a robust fold. FT, with even more disulfide-linked cysteines, a lower instability index (34.75), higher aliphatic index (55.85), and similar half-life extensions, is also highly stable. Unlike poorly designed peptides that exhibit high instability indices, short half-lives, and unreliable predicted structures, BC and FT demonstrate ideal physicochemical profiles—balanced charge, extensive covalent stabilization, and resistance to degradation—providing a solid foundation for functional studies and therapeutic development.

PyMOL

Disulfide bonds play a crucial role in stabilizing protein structures by forming covalent links between cysteine residues. These bonds help maintain the overall protein architecture, especially for secreted and membrane proteins exposed to harsh oxidative environments. By constraining the folding and reducing conformational flexibility, disulfide bonds enhance protein stability, facilitate correct folding, and protect against denaturation or degradation under physiological stress.

At last, disulfide bond analysis was performed using PyMOL to understand the stability of our fusion peptides. While BC contains no disulfide bonds, FT has one, consistent with its smaller size and shorter sequence. The presence of this single disulfide bond in FT may contribute to its structural resilience, whereas the absence in BC suggests a reliance on other stabilizing interactions. Recognizing these differences informs our approach to optimizing peptide stability and functionality, ensuring that each fusion peptide performs effectively in its respective biological context.

PyMOL Results

BC 3D model

FT 3D model

Other bonds in the structures of the peptides

Peptide folding and structural maintenance depend on a variety of non-covalent interactions that collectively shape and stabilize the three-dimensional structures essential for their biological function. Among these, electrostatic interactions, hydrophobic effects, hydrogen bonding, and van der Waals forces each contribute in unique ways.

Electrostatic interactions originate from the attraction between oppositely charged side chains, such as lysine and arginine (positively charged) and aspartate and glutamate (negatively charged). Salt bridges, which are ionic bonds formed between these residues, stabilize tertiary and quaternary structures by neutralizing repulsive forces between like charges and drawing distant peptide segments together. These ionic interactions not only provide structural stability but also influence folding pathways by imposing favorable conformations. Furthermore, the spatial arrangement of charged amino acids contributes to the overall peptide charge and solubility, preventing aggregation through charge repulsion. Hydrophobic effects drive folding based on the principle that nonpolar side chains tend to avoid water. Hydrophobic amino acids cluster within the peptide, creating a compact core shielded from the aqueous environment. This collapse lowers the system's free energy by reducing the ordered water molecules surrounding hydrophobic groups, and promotes van der Waals interactions, which stabilize densely packed side chains. This hydrophobic core formation is often the initial step in peptide folding, crucial for proper assembly of secondary and tertiary structures.

Hydrogen bonding plays a critical role in maintaining secondary structures like α-helices and β-sheets. These bonds form between backbone amide hydrogens and carbonyl oxygens, locking the peptide backbone into repeating structural motifs that stabilize the fold and reduce entropy loss during folding. Side-chain hydrogen bonds also add specificity and further reinforcement to the native conformation.

Hydrogen-bonds of BC + TrkB binding (indicated by blue dotted lines)

Hydrogen-bonds of BC + anp32a binding (indicated by blue dotted lines)

Hydrogen-bonds of FT + GSK3 binding (indicated by blue dotted lines)

Hydrogen-bonds of FT + PI3K binding (indicated by blue dotted lines)

Hydrogen-bonds of FT + AKT1 binding (indicated by blue dotted lines)

Hydrogen-bonds of FT + NOS binding (indicated by blue dotted lines)

Finally, van der Waals forces, though individually weak, cumulatively contribute to closely packed atomic arrangements within the peptide interior. These dispersive interactions facilitate tight packing of atoms, increasing molecular compactness and overall stability.

Applying these principles to BC, the fusion peptide features a balanced distribution of charged residues that allows efficient salt bridge formation, reducing electrostatic repulsion and stabilizing the fold. Hydrophobic amino acids such as leucine, isoleucine, valine, and phenylalanine cluster internally, driving core compaction and supporting secondary structural elements like α-helices and β-turns. These helices and turns are further locked in place by backbone hydrogen bonding, contributing to a well-organized and stable peptide conformation.

For FT, the higher density of charged residues facilitates an extensive network of salt bridges that enhances both folding specificity and solubility by balancing charges and reducing aggregation tendencies. Its hydrophobic regions composed of leucine, valine, phenylalanine, and glycine residues create dense cores essential for a compact three-dimensional architecture. Backbone hydrogen bonds reinforce these secondary structures within the peptide, maintaining its functional tertiary fold. Together, BC and FT peptides exemplify how the combined action of electrostatic interactions, hydrophobic effects, hydrogen bonding, and van der Waals forces work synergistically to ensure their structural integrity and biological functionality, complementing the stability provided by disulfide bonds.

ZDOCK

Understanding protein-protein interactions is crucial in drug development, especially when our drug is composed of two different fusion peptides. Excessive or unintended reactions between these peptides can significantly hinder the drug's efficacy and overall performance. By accurately predicting how these proteins interact, we can formulate strategies to minimize adverse interactions that may compromise the desired therapeutic effects. This insight ensures that the fusion peptides function as intended, leading to improved drug stability and effectiveness.

ZDOCK is a protein docking web server that facilitates the prediction of protein-protein interactions by exhaustively exploring possible rigid-body orientations and positions of two protein structures. Users can input protein structures either through file upload or PDB ID specification, and the server systematically samples numerous rotational and translational configurations to generate potential docking complexes. Each predicted binding pose is assigned a score reflecting its structural and energetic favorability, allowing researchers to quickly identify the most plausible interaction models through a streamlined, accessible interface

The priciple of ZDOCK

ZDOCK operates by representing both proteins on a 3D grid and using Fast Fourier Transform (FFT) algorithms to efficiently explore rotational and translational degrees of freedom. This grid-based approach allows ZDOCK to rapidly evaluate thousands of ligand orientations relative to the receptor.

What is FFT-based search in docking?

The superior computational efficiency of the Fast Fourier Transform (FFT)-based search in protein docking arises from its fundamental reformulation of the spatial correlation problem. Traditional exhaustive search methods operate in real space, requiring the explicit evaluation of the scoring function for each discrete translational degree of freedom for every rotational orientation of the ligand. This process scales multiplicatively with the number of rotations and translations, resulting in a prohibitive computational cost for high-resolution grids. In contrast, the FFT-based approach exploits the mathematical equivalence between spatial correlation and simple multiplication in Fourier space. By first transforming the receptor and rotated ligand grids into their frequency-domain representations via the FFT, the cross-correlation for all possible translations is computed simultaneously through a single, computationally inexpensive complex multiplication operation. The result is then transformed back to real space via an inverse FFT, yielding the complete scoring landscape for that rotation. This method reduces the scaling of the translational search from O(N³) to O(N²logN) for an N³ grid, thereby conferring a dramatic speed advantage and enabling the systematic exploration of the vast conformational space inherent to protein-protein docking.

In FFT-based protein docking, the protein's 3D spatial information is converted into frequency space using the Fast Fourier Transform (FFT), which mathematically transforms a spatial function f(x) defined on a 3D grid into a function F(k) in frequency space:

Here, x represents positions in the 3D space and k is the frequency vector. The exponential term decomposes the spatial data into sinusoidal waves of different frequencies.

This transformation allows the complex shape and property distributions of the protein, originally described in space, to be represented as a sum of simpler frequency components. The FFT efficiently computes this transform across all spatial points. By converting the protein structure into this frequency format, docking algorithms can rapidly analyze and sample potential protein configurations using efficient mathematical operations in frequency space before converting results back to spatial coordinates. This reduces computational cost dramatically , to direct spatial computations alone.

Then, the multiplication system in FFT-based protein docking assigns values to a 3D grid around each protein to represent different regions. For example in shape docking:

+1 for the protein surface, which are good contact points,
-K (a large negative value, e.g., -10) for the protein interior, which represent forbidden space,
0 for solvent or empty space, which is neutral.

When docking two proteins, these grid values are multiplied together at overlapping positions, and the result reflects the quality of the interaction:

Surface-to-surface (+1 * +1 = +1): Ideal contact indicating strong shape complementarity and a desirable docking pose.
Surface-to-empty or interior-to-empty (any value * 0 = 0):Neutral, no meaningful interaction.
Surface-to-interior (+1 * -10 = -10):Steric clash, a bad overlap that penalizes the docking score.
Interior-to-interior (-10 * -10 = +100): Core overlap, physically impossible and strongly penalized or discarded.

The docking algorithm evaluates all possible positions and rotations using these multiplications and sums the results to produce a score for each pose. High positive scores (many surface contacts and few clashes) indicate good fits, low or negative scores show poor docking due to clashes, and extremely high scores highlight invalid, overlapping conformations. This scoring “language” allows the program to efficiently identify biologically plausible docked complexes based on their spatial compatibility.

The docking poses are scored based on a composite function integrating shape complementarity, electrostatics, and a pairwise atomic statistical potential derived from known protein interfaces. ZDOCK iteratively examines thousands of conformations, ranking them to identify the most probable binding modes. This methodology enables high predictive accuracy for protein interactions, especially for rigid-body docking cases, and has been validated on numerous benchmark datasets and blind challenges such as CAPRI. The server interface allows users to submit structures and obtain ranked predictions of complex conformations, offering valuable insights into the molecular basis of protein-protein interactions that can guide experimental design and drug development.

The first three columns represent the Euler angles (in radians) detailing the rotational orientations applied to the BC and FT peptides during the docking simulations, enabling exploration of a wide range of 3D orientations. The next three columns show the translational shifts that describe how BC and FT move relative to their initial positions, thus covering different spatial alignments. Collectively, these rotational and translational parameters constitute a comprehensive sampling of possible binding poses between BC and FT. The final column shows the ZDOCK scores, which quantify the favorability of each interaction based on shape complementarity, electrostatics, and statistical potentials.

ZDOCK Results

Top 10 results of ZDOCK test for BC and FT

In summary, these results correspond to the top 10 docking poses generated by ZDOCK, representing the most plausible conformations within the searched space. The highest ZDOCK score observed is 1796.481, which is below the accepted threshold of around 2000 indicating meaningful interactions. This suggests that even the best-scoring configurations are unlikely to represent stable or biologically relevant protein-protein binding. The decreasing scores of subsequent poses further reinforce that all sampled conformations exhibit weak interaction potential. This outcome is encouraging as it implies that BC and FT fusion peptides do not strongly self-associate or aggregate, reducing risks of loss of drug activity, undesired precipitation, or immune system activation.

The low likelihood of peptide-peptide binding ensures that each fusion peptide retains its structural and functional independence within the drug formulation. This prevents self-interference, promotes stability and solubility, improves pharmacokinetic performance, and ultimately enhances the therapeutic efficacy and safety of the combined fusion peptide drug.

AlphaKnot 2.0 (Protein Knotting)

AlphaKnot 2.0 is a computational tool that analyzes protein structures to identify and characterize knots formed by the folding of the polypeptide chain. These knots influence protein stability, folding pathways, and function. The tool applies advanced algorithms to detect such topological features with high precision, providing insights into protein folding mechanisms that complement conventional structural analyses.

In our project, AlphaKnot 2.0 was used to assess the predicted fusion peptide structures for any knotting. This evaluation helped reveal important topological characteristics that could impact peptide stability and performance. By understanding and addressing potential knot-related constraints, we optimized the peptide designs to ensure favorable conformations and improved functional reliability. Integrating AlphaKnot 2.0 enabled us to enhance the accuracy and robustness of our fusion peptides’ structural models.

The principle of computational analysis

AlphaKnot 2.0 works by analyzing the protein backbone using the coordinates of alpha carbon (Cα) atoms obtained from standard protein structure files (PDB or CIF). It first evaluates the entire structure for knots using a probabilistic approach based on the HOMFLY-PT polynomial, where thousands of random chain closures are tested to determine if a knot is present with high confidence. When a knot is detected, the algorithm further refines the analysis by identifying the minimal segment of the protein chain that forms the knot, called the knot core, and produces detailed knot maps to localize and characterize these topological features precisely.

This topological analysis is coupled with advanced visualization tools, such as a customized PDBe Mol* viewer, which highlights knotted subchains and simplifies structures to reveal knot locations clearly. Accompanying interactive knot maps offer residue-level information including knot cores, tails, and lengths, helping users understand complex protein entanglements. The server infrastructure uses Python and the Flask framework, with asynchronous task management on Linux clusters to efficiently handle large-scale computations and user submissions. This robust computational setup enables comprehensive knot detection and topological validation of protein models, providing critical insights that complement traditional structural analysis.

AlphaFold 2.0 Results

BC 3D model

FT 3D model

Topology type: UNKNOTTED

An unknotted protein structure is advantageous for fusion peptides like FT and BC, which engage different cellular receptors. Such unknotted conformations tend to fold more efficiently and consistently, minimizing the risk of misfolding or aggregation that could impair function. This structural simplicity allows the peptides to maintain the flexibility and independence necessary for effectively interacting with their respective receptors. Consequently, ensuring that these fusion peptides remain unknotted supports their stability, receptor specificity, and overall therapeutic efficacy, crucial for the success of the eyedrop treatment.

Step 3: Stability Predictions

Following structural analysis, further predicting the stability of our fusion peptides is essential to ensure they retain their intended structure and functionality in physiological environments. While structural analysis confirms the peptides adopt the correct conformations, stability prediction evaluates how well these conformations withstand factors such as temperature, pH, and enzymatic degradation. By linking these steps, we obtain a comprehensive understanding, from static shape to dynamic resilience, guiding us to the optimization of peptide candidates with desirable therapeutic properties.

To achieve a comprehensive stability evaluation, we employed multiple computational tools targeting different aspects of peptide behavior. CamSol was used to predict the intrinsic solubility of the peptides, identifying regions prone to low solubility and aggregation. Medusa assessed the flexibility of the protein structure, as excessive flexibility can compromise stability. DeepSTABp provided thermostability predictions and insights into how mutations might influence peptide stability. Lastly, ProtParam was utilized to calculate physicochemical properties such as hydropathicity, molecular weight, and isoelectric point, which contribute to the overall stability and behavior of the peptides. Combining these analyses offered a detailed understanding of the fusion peptides' stability, proving essential for their design refinement and therapeutic development.

CamSol (Intrinsic Solubility)

CamSol is a computational tool that predicts both the intrinsic solubility and aggregation propensity of proteins and peptides based on their amino acid sequences. By evaluating physicochemical properties such as hydrophobicity, charge, and secondary structure tendencies, CamSol assigns solubility scores to residues within the sequence. This enables identification of regions that may cause unwanted aggregation or be too soluble, as either extreme could negatively affect a peptide’s stability and therapeutic efficacy. Managing this balance is essential to optimize the drug-like properties of fusion peptides.

In our study, CamSol was employed to analyze the fusion peptides FT and BC, helping us pinpoint segments prone to aggregation or excessive solubility. This information guided modifications to reduce aggregation risks without compromising necessary solubility, ensuring the peptides maintain the desired balance for stability and function. By integrating CamSol’s predictions into our design process, we enhanced the developability and therapeutic potential of our fusion peptides.

The Mathematical Concept

CamSol uses three methods to calculate amino acid pKa values:

Using tabulated pKa values (taken from http://compbio.clemson.edu/pkad (SI)),
Using PROPKA,
Using IPC.

CamSol predictions are based on the Zyggregator method [Tartaglia GG, Pawar AP, Campioni S, et al. Prediction of aggregation-prone regions in structured proteins. J Mol Biol 2008;380:425–36]

In CamSol, four properties:

Charge
Hydrophobicity
α-helical propensity
β-sheet propensity

are combined to assign a score to each amino acid. It is then smoothed to account for the effect of neighboring residues, and corrected for hydrophobic-hydrophilic patterns and gatekeeper effects. An overall solubility score is calculated from this profile.

The charges for the amide group at the N-terminus and the carboxylic acid at the C-terminus are calculated by using the Henderson–Hasselbalch equation:

$Team Photo$

Therefore, CamSol relies on accurate pKa values (either from the updated table, or calculated with PROPKA or IPC), and employs partial charges when the pH is close to the pKa of a charged amino acid.

Using the ratio of charged to neutral species calculated with the above equation, log⁡DpH, representing hydrophobicity by pH-dependent hydrophobicity values, combines the partition coefficient logP of neutral and ionized species.

$Team Photo$

δ is the difference between pKa and pH (pKa—pH for basic residues and pH—pKa for acidic residues). We used the pH-dependent log⁡DpH calculations by Zamora and colleagues [Zamora WJ, Campanera JM, Luque FJ. Development of a structure-based, pH-dependent Lipophilicity scale of amino acids from continuum solvation calculations. J Phys Chem Lett 2019;10:883–9.] for neutral and ionized LogP values for all standard amino acids.

CamSol uses PROPKA, an open-source available pKa predictor [Søndergaard CR, Olsson MHM, Rostkowski M, et al. Improved treatment of ligands and coupling effects in empirical calculation and rationalization of pKa values. J Chem Theory Comput 2011;7:2284–95] [Olsson MHM, Søndergaard CR, Rostkowski M, et al. PROPKA3: consistent treatment of internal and surface residues in empirical pka predictions. J Chem Theory Comput 2011;7:525–37], to calculate accurate pKa values.

Samples of CamSol prediction model (excerpt from figure 2 of website [Marc Oeller, Ryan Kang, Rosie Bell, Hannes Ausserwöger, Pietro Sormanni, Michele Vendruscolo, Sequence-based prediction of pH-dependent protein solubility using CamSol, Briefings in Bioinformatics, Volume 24, Issue 2, March 2023, bbad004])

Camsol prediction model samples

“CamSol predicts solubility values that are highly correlated with experimental solubility values. Plots on the left-hand side in each column visualize how experimental and predicted values change over a range of pH values. The left axis and blue line report the predicted CamSol solubility score, the right axis and green markers the measured midpoints of PEG-precipitation, all as a function of pH (x-axis). The vertical yellow line is the theoretical isoelectric point. Plots on the right-hand side show the correlation between the predicted and measured relative solubility values. CamSol calculations were carried out using pKa values calculated by PROPKA for (A) DesAbO (nanobody), (B) bovine serum albumin (BSA), (C) hen egg white lysozyme and (D) human serum albumin (HSA), whereas for (E) α-synuclein and (F) insulin pKa values were calculated with IPC (framed in blue box). R is the Pearson’s coefficient of correlation.”

Camsol results

pH Solubility

pH	Solubility score of BC	Solubility score of FT
1	0.992	1.085
1.5	0.991	1.085
2	0.990	1.085
2.5	0.987	1.083
3	0.974	1.072
3.5	0.945	1.047
4	0.899	1.010
4.5	0.846	0.967
5	0.798	0.928
5.5	0.749	0.898
6	0.698	0.881
6.5	0.649	0.873
7	0.598	0.868
7.5	0.534	0.868
8	0.458	0.881
8.5	0.394	0.907
9	0.356	0.937
9.5	0.363	0.970
10	0.415	0.990
10.5	0.520	1.014
11	0.630	1.008
11.5	0.736	0.952
12	0.816	0.932
12.5	0.881	0.930
13	0.911	0.914
13.5	0.923	0.858
14	0.935	0.804

Solubility of BC and FT in different pH

The differing solubility responses of BC and FT to pH changes have important practical implications for their formulation and use. FT’s stable, high solubility across a wide pH range suggests it will remain soluble and functional under diverse physiological and storage conditions, making it easier to handle and formulate. This stability reduces risks of precipitation or loss of activity, supporting consistent therapeutic performance.

In contrast, BC’s solubility varies more significantly with pH, exhibiting a low-solubility region near neutral to slightly alkaline conditions, which could increase the risk of aggregation or precipitation in this range. This sensitivity means that BC formulations may require careful pH optimization to maintain stability and efficacy. However, the recovery of solubility at higher alkaline pH offers an opportunity to adjust formulation conditions accordingly. Understanding these characteristics allows for tailored strategies to maximize the stability and effectiveness of each peptide in their respective therapeutic environments.

Intrinsic Solubility

The CamSol method yields a solubility profile (one score per residue in the protein sequence) where regions with scores below -1 are aggregation promoting, above 1 solubility promoting.

Intrinsic Solubility of BC (overall score of 0.562263)

Based on the CamSol solubility profile, the BC ligand protein exhibits a high predicted intrinsic solubility. The profile indicates a lack of significant aggregation-prone regions, with predominantly neutral to positive solubility scores throughout the sequence. This suggests that BC is unlikely to aggregate under physiological conditions, making it a stable candidate for therapeutic use.

Practically, this high solubility means that BC is well-suited for recombinant expression in systems such as E. coli. Its favorable solubility simplifies the purification process and allows it to remain in solution at higher concentrations, enhancing its developability and potential for effective drug formulation.

Intrinsic Solubility of FT (overall score of 0.923477)

From the results above, the FT protein demonstrates a very high predicted intrinsic solubility, with an overall solubility score of 0.923, clearly classifying it as a highly soluble protein. Its residue-level solubility profile supports this assessment, showing no regions prone to aggregation and several that enhance solubility. This favorable profile suggests that FT is unlikely to form insoluble inclusion bodies during recombinant expression in bacterial systems like E. coli.

From a practical standpoint, FT's high solubility greatly facilitates the purification process and increases its likelihood of remaining soluble at higher concentrations. This stability and solubility profile make FT an excellent candidate for efficient production and downstream drug formulation, enhancing its therapeutic potential.

Summary of the above results

In summary, the comprehensive evaluation of the fusion peptides BC and FT reveals important insights into their solubility and stability profiles, which are critical for their successful development as therapeutic agents. FT demonstrates consistently high solubility across a broad pH range, indicating robustness under various physiological and storage conditions, thereby facilitating easier handling and reliable performance. BC, while also intrinsically soluble, exhibits more pH-sensitive solubility, with reduced solubility around neutral to slightly alkaline pH, necessitating targeted formulation strategies to maintain stability in this range.

Both peptides show favorable intrinsic solubility profiles with minimal aggregation-prone regions, making them excellent candidates for recombinant expression in bacterial systems like E. coli and simplifying the purification process. These characteristics promote their stability at higher concentrations, which is essential for therapeutic efficacy and manufacturability. Understanding these differences allows for informed formulation adjustments—such as pH optimization and buffer selection—to maximize stability and solubility, ultimately enhancing the developability and clinical potential of each fusion peptide.

MEDUSA (Protein Flexibility)

Understanding protein flexibility is essential in drug development because it enables predictions of how a protein drug will interact with its target and perform biological functions within the body. Protein flexibility allows for conformational changes, enhancing the ability to recognize and bind various receptor sites, and contributing to higher binding affinity and specificity. Accounting for these dynamic movements is crucial, as they often underpin critical processes like signaling, activation, or catalysis. In the context of drug design, evaluating flexibility helps optimize interactions, anticipate structural changes during binding, and reduce unfavorable steric clashes, ultimately improving therapeutic efficacy and stability.

To assess the flexibility of the fusion peptides in our project, we utilized the computational tool MEDUSA. This platform quantitatively predicts the flexibility of protein structures by analyzing their dynamic properties and conformational behavior. By applying MEDUSA, we identified flexible regions and evaluated how these may influence peptide stability and function. These insights guided our rational optimization of the fusion sequences, ensuring that our protein drugs maintain both sufficient adaptability for target engagement and adequate structural stability for therapeutic use.

Principle of AI Analysis

The MEDUSA (Multiclass flexibility prediction from sequences of amino acids) server uses information of the multiple sequence alignment of the homologous sequences and physico-chemical properties of individual amino acids to attribute flexibility class for each residue using a deep convolutional neural network [7]. The flexibility of the proteins are graded on a scale from 0 to 4, with 0 being the most rigid and 4 being the most flexible. The flexibility prediction of each amino acid is also assigned a confidence score, which indicates the probability of the MEDUSA server correctly predicting the flexibility. The score has 3 categories: score < 0.4, 0.4 ≤ score ≤ 0.5, score > 0.5. As such, we can evaluate the reliability of the MEDUSA predictions based on the provided confidence scores.

The following is a summary of the process in MEDUSA:

Extract evolutionary information: MEDUSA finds homologs of the query sequence by HHblits search.
MEDUSA filters the resulting Multiple sequence alignment (MSA) file using HHfilter
The final MSA is translated into a probability profile using position specific score matrix: each position of the sequence is thus encoded by 21 numerical values corresponding to 20 amino acid types and gaps.
MEDUSA translates each amino acid to 58 numerical values, which encode its physico-chemical properties (using AA INDEX scheme).
MEDUSA creates one hot encoding of each amino acid and adds a flag for the sequence terminus.
Using a sliding window of 15 amino acids, MEDUSA creates input vectors for each sequence position for all the considered features.
Different features are merged to create an input vector for the prediction of dimensions 15x100.
The neural network performs binary and multi-class predictions and provides the general summary as well as flexibility prediction and confidence value for each amino acid.

The accuracy of MEDUSA predictions is dependent on the protein size. The mean accuracy is almost the same for the range of the considered sequence lengths, the deviation of the accuracy values increases for shorter proteins.

Diagram taken from MEDUSA website

MEDUSA Results

flexibility of BC

flexibility of FT

The observation that only 26% of BC and 23% of FT structures exhibit flexibility strongly suggests that these peptides predominantly maintain stable conformations with limited molecular motion. This reduced flexibility translates into higher structural integrity and makes these peptides less susceptible to unfolding or denaturation, which is a significant advantage for drug development. Stable proteins are better at retaining their active forms over time, ensuring reliable biological activity. They also demonstrate improved stability during formulation and storage, reducing the likelihood of aggregation and enhancing shelf-life—key considerations for therapeutic efficacy.

Furthermore, limited flexibility in crucial regions allows for more consistent and specific interactions with target receptors, supporting sustained and effective binding. This characteristic minimizes undesirable conformational changes that could compromise function or lead to off-target effects. Overall, the low flexibility of BC and FT supports their potential as robust drug candidates, allowing them to perform reliably under physiological conditions and contributing to efficient, long-lasting therapeutic outcomes.

CSM-Toxin (Toxicity Test)

Testing the toxicity of an ocular drug is a critical step in ensuring its safety and efficacy because the eye is a highly sensitive and complex organ responsible for vision, which is essential to quality of life. Ocular toxicity can lead to damage or loss of important structures such as the cornea, retina, or optic nerve, resulting in impaired vision or blindness. Common ocular adverse effects may include inflammation, irritation, corneal opacity, retinal damage, or even irreversible conditions like glaucoma or cataracts. As the eye is composed of many specialized tissues that must function harmoniously, any disruption caused by toxic effects can have widespread consequences. Therefore, rigorous toxicity testing helps identify potential harmful effects early in drug development, enabling modifications to drug formulation or dosing that minimize risks.

Principle of AI Analysis

CSM-Toxin is an in-silico protein toxicity classifier, relying only on the protein primary sequence. By treating residues as words and protein sequences as sentences the protein sequence information can be processed using a deep learning natural languages model. The model works with proteins of arbitrary length.

CSM-Toxin is trained using the largest and the most up-to-date dataset of experimentally measured protein and peptide toxicities. The curated dataset used for training the model contains 2475 toxic sequences and 214,740 non-toxic sequences, reflecting inherent biases, with a toxic to non-toxic ratio of approximately 90. The predictive model was solely built using the raw amino acid sequences with no additional features extracted or generated. Using this data, ProteinBERT (Bidirectional Encoder Representations from Transformers) [Morozov, V.; Rodrigues, C.H.M.; Ascher, D.B. CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics 2023, 15, 431] was adapted to develop CSM-Toxin, a predictive model of protein toxicity that relies solely on the amino acid sequence with no additional features. The ProteinBERT model was fine-tuned by training for 20 epochs non-trainable parameters using the Adam optimiser at a learning rate of 0.005. After this, ProteinBERT was trained for 15 more epochs at a learning rate of 0.0001 and weight decay. To inform the model to pay more attention to positive entries class weights were utilised.

Graph of AI learning process [ Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. 10: Morozov, V., Rodrigues, C. H. M., & Ascher, D. B. (2023). CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics, 15(2), 431 ]

CSM-Toxin reads the Global Representation outputs of each of the six Transformer layers and uses them for model output. To prevent overfitting, there is also a Dropout layer with a probability of discarding a connection of 0.5. To determine toxicity, there is a fully connected layer with Sigmoid activation and a single output, if the output is below the threshold, the protein is non-toxic.

The model outputs a value between 0 and 1 (the activation function after the last layer is sigmoid). To obtain a binary prediction, a threshold must be chosen, where all values less than this threshold will be treated as negative predictions and all values greater or equal than the threshold will be treated as positive predictions. During cross-validation, we varied thresholds from 0.01 to 1.0 with steps of 0.01 and examined the corresponding changes in MCC (Matthews Correlation Coefficient), AUC (Area Under Curve), and Precision and Recall on the validation sets. The main metric used to choose the final model architecture and hyperparameters is MCC. In the output, results are shown in a downloadable table with predictions shown alongside a set of general physicochemical properties calculated using the Peptides package [Osorio, D.; Rondón-Villarreal, P.; Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015, 7, 4–14] [Morozov, V., Rodrigues, C. H. M., & Ascher, D. B. (2023). CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics, 15(2), 431] [Jung, F., Frey, K., Zimmer, D., & Mühlhaus, T. (2023). DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences, 24(8), 7444 ].

CSM-Toxin Results

CSM-Toxin results of BC and FT

Based on their predicted properties, both the BC and FT ligands are anticipated to be non-toxic, primarily due to their strongly hydrophilic character as evidenced by negative hydrophobicity scores, which makes them very unlikely to interact with or disrupt lipid membranes—a common mechanism of toxicity. This hydrophilic nature is further promoted by a significant number of polar and charged residues in their amino acid composition, which enhances solubility and reduces membrane-binding potential; although both proteins possess a strong positive net charge and a moderate number of aromatic and aliphatic residues, these are outweighed by their overall hydrophilic character, preventing a dominant hydrophobic behavior that could pose a toxicity risk.

DeepSTABp (Thermostability)

Determining the boiling point of an ocular drug is important to ensure its stability and safety under physiological conditions. Ideally, the drug should have a boiling point significantly higher than human body temperature, around 37 °C, to prevent it from vaporizing or degrading upon administration or during storage. If the boiling point is too low, the drug may evaporate or lose potency when exposed to the warm and moist environment of the eye, leading to inconsistent dosing and reduced therapeutic efficacy. Additionally, a low boiling point could complicate manufacturing and handling, as well as increase the risk of ocular irritation or damage due to changes in the drug’s physical state. Therefore, aiming for a boiling point above physiological temperature helps maintain the drug’s integrity, ensures consistent delivery, and promotes patient safety during ocular application.

Principle of AI Analysis

Similar to CSM-Toxin, DeepSTABp uses a transformer-based protein language model for sequence embedding and state-of-the-art feature extraction in combination with other deep learning techniques for end-to-end protein melting temperature prediction.

Schematic overview of the deep learning architecture DeepSTABp

DeepSTABp is based on four different artificial network blocks. The first three blocks create an embedding of the protein query based on the input features:

Type of experimental condition used in the thermal proteome profiling experiment,
The protein amino acid sequence
The organism's growth temperature.

Block 1 and block 3 use small multilayer perceptrons (MLP). Block 3 consists of the pretrained transformer-based model ProtTrans-XL, followed by a mean pooling layer. The output vectors of the first blocks are joined and inputted into a final MLP block, which outputs the predicted protein.

Datasets used for model training and evaluation in this study were derived from high-throughput mass spectrometry-based thermo-proteome profiling (TPP) assays. To achieve an extensive and homogenous collection of experimentally determined protein melting temperatures (Tms), individual protein melting points are determined by fitting the following non-linear model:

$Team Photo$

with a being the asymptote, m being the slope, and Tmid denoting the mid-temperature of the fitted curve. Tm was obtained by finding the temperature where the fitted function reaches a value of 0.5. In order to retrieve only reliable Tms, only model estimates with an R2score > 0.9 and temperature variance of less than 2॰c were retained in the final data set.

To validate the performances of models during training and testing and to allow for a fair comparison to alternative approaches, different commonly used evaluation metrics were computed. Each metric measures the discrepancy between vectors of N experimentally determined Tms (y) and predicted Tms (ŷ).

The mean average error (MAE):

$Team Photo$

The mean squared error (MSE):

$Team Photo$

The root mean squared error (RMSE):

$Team Photo$

Sample Pearson correlation coefficient (PCC):

$Team Photo$

And the coefficient of determination (R2):

$Team Photo$

DeepSTABp Results

After providing the DeepSTABp server with the amino acid sequences of the peptides and proteins involved, we obtained the boiling points of the following:

Protein	Boiling point [°C]
BC	49.31
FT	47.91

The high boiling points of BC and FT ligands indicate their strong thermal stability, suggesting these proteins can maintain their structure and function even at elevated temperatures, which is highly beneficial for industrial and biotechnological applications. In the context of eye drops, the low volatility of these high-boiling-point ligands significantly enhances formulation stability and ocular safety. Because BC and FT have minimal evaporation during storage at room temperature or after opening, the active ingredient concentrations remain consistent, avoiding reductions in efficacy that can occur with volatile substances. This stability also prevents increased irritation that often results from concentrated preservatives or buffers as volatile components evaporate. At the eye’s temperature of approximately 34 °C, these ligands stay fully liquid, eliminating vapor formation or bubbles that could cause stinging sensations on sensitive ocular tissues like the conjunctiva or cornea. Their compatibility with the tear film ensures gentle, even distribution over the eye surface, improving comfort and drug delivery efficiency. Furthermore, the thermal stability and non-volatile nature of BC and FT simplify storage and usage, as refrigeration is unnecessary and air exposure post-opening does not degrade or diminish the formulation, reducing waste and infection risks. By maintaining ocular moisture balance, they also support tear film stability, which is especially important for long-term treatments, such as glaucoma management or dry eye relief, making BC and FT ideal ligands for ocular drug formulations.

ProtParam (Protein Half-life, Stability)

A thorough understanding of a protein’s physicochemical properties is fundamental in both basic research and drug development. Key characteristics such as molecular weight, isoelectric point (pI), amino acid composition, extinction coefficient, hydropathicity, and predicted half-life all provide crucial insights into protein stability, solubility, and biochemical behavior in different environments. These parameters influence protein expression, purification, storage, and formulation efficacy, impacting not only stability and manufacturability but also the therapeutic potential and safety profile of candidate drugs. Accurate knowledge of these properties allows researchers to tailor conditions for optimal protein function and helps predict how modifications or environmental factors may influence activity and stability.

ProtParam is an indispensable tool for efficiently obtaining this comprehensive set of physicochemical data directly from protein or peptide sequences. By calculating molecular weight, pI, atomic composition, instability and aliphatic indices, extinction coefficient, GRAVY, and estimated half-life, ProtParam enables rapid assessment of key protein properties early in the design and optimization process. In our workflow, application of ProtParam to the fusion peptides BC and FT supported rational sequence design, guided formulation strategies, and informed candidate selection by predicting parameters relevant to expression stability and solubility. This approach maximizes the likelihood of developing robust, effective, and manufacturable protein therapeutics.

The Mathematical Concept

ProtParam computes various physico-chemical properties that can be deduced from a protein sequence. No additional information is required about the protein under consideration. [Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607]

Extinction coefficients

The extinction coefficient indicates how much light a protein absorbs at a certain wavelength. It is useful to have an estimation of this coefficient for following a protein with a spectrophotometer when purifying it.

It is possible to estimate the molar extinction coefficient of a protein from knowledge of its amino acid composition [Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem. 182:319-326(1989).] From the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb appreciably at wavelengths >260 nm, while cystine does) at a given wavelength, the extinction coefficient of the native protein in water can be computed using the following equation:

$Team Photo$

Ex = Extinction coefficient of each amino acid, Nx = number of amino acids

ETyr = 1490, ETrp = 5500, ECys = 125

The absorbance (optical density) can be calculated using the following formula:

$Team Photo$

Etotal and A are produced by ProtParam based on the above equations, both for proteins measured in water at 280 nm. The first one shows the computed value based on the assumption that all cysteine residues appear as half cystines (i.e. all pairs of Cys residues form cystines), and the second one assuming that no cysteine appears as half cystine (i.e. assuming all Cys residues are reduced). Experience shows that the computation is quite reliable for proteins containing Trp residues, however there may be more than 10% error for proteins without Trp residues.

Note: Cystine is the amino acid formed when a pair of cysteine molecules are joined by a disulfide bond.

In vivo / vitro half-life

The half-life is a prediction of the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell. ProtParam relies on the "N-end rule" [Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186.], which relates the half-life of a protein to its N-terminal residue; the prediction is given for 3 model organisms (human, yeast and E.coli).

The "N-end rule" was established from experiments [Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712.] that explored the metabolic fate of artificial beta-galactosidase proteins with different N-terminal amino acids engineered by site-directed mutagenesis. The beta-gal proteins thus designed have strikingly different half-lives in vivo, from more than 100 hours to less than 2 minutes, depending on the nature of the amino acid at the amino terminus and on the experimental model.

Amino acid	Mammalian	Yeast	E. coli
Ala	4.4 hour	>20 hour	>10 hour
Arg	1 hour	2 min	2 min
Asn	1.4 hour	3 min	>10 hour
Asp	1.1 hour	3 min	>10 hour
Cys	1.2 hour	>20 hour	>10 hour
Gln	0.8 hour	10 min	>10 hour
Glu	1 hour	30 min	>10 hour
Gly	30 hour	>20 hour	>10 hour
His	3.5 hour	10 min	2 min
Ile	20 hour	30 min	>10 hour
Leu	5.5 hour	3 min	2 min
Lys	1.3 hour	3 min	2 min
Met	30 hour	>20 hour	>10 hour
Phe	1.1 hour	3 min	2 min
Pro	>20 hour	>20 hour	?
Ser	1.9 hour	>20 hour	>10 hour
Thr	7.2 hour	>20 hour	>10 hour
Trp	2.8 hour	3 min	2 min
Tyr	2.8 hour	10 min	2 min
Val	100 hour	>20 hour	>10 hour

Table of the amino acids and the corresponding half-life [Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607]

Instability index

The instability index provides an estimate of the stability of your protein. Statistical analysis of 12 unstable and 32 stable proteins has revealed [16] that there are certain dipeptides, the occurrence of which is significantly different in the unstable proteins compared with those in the stable ones. The authors of this method have assigned a weight value of instability to each of the 400 different dipeptides (DIWV). Using these weight values it is possible to compute an instability index (II) which is defined as:

$Team Photo$

where: L is the length of sequence and DIWV(x[i]x[i+1])is the instability weight value for the dipeptide starting in position i.

A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.

Aliphatic index

The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. The aliphatic index of a protein is calculated according to the following formula [Ikai, A.J. (1980) Thermostability and aliphatic index of globular proteins. J. Biochem. 88, 1895-1898. ]:

$Team Photo$

where XAla ,XVal ,XIle , XLeu are mole percent (100 X mole fraction) of alanine, valine, isoleucine, and leucine. The coefficients are the relative volume of the valine side chain to the side chain of alanine.

ProtParam Results

BC
1. Molecular weight: 4482.15
2. Total number of negatively charged residues (Asp + Glu): 1
3. Total number of positively charged residues (Arg + Lys): 5
4. Atomic composition:
  1. Formula: C₂₀₁H₂₉₉N₅₃O₅₆S₄
  2. Total number of atoms: 613
5. Extinction coefficients ((M^-1cm^-1), at 280 nm measured in water):
  - Ext. coefficient: 17085, assuming all pairs of Cystine residues form cystines
  - Ext. coefficient: 16960, assuming all Cystine residues are reduced
6. Estimated half-life:
  1. The N-terminal of the sequence considered is S (Ser).
  2. The estimated half-life is:
    1. 1.9 hours (mammalian reticulocytes, in vitro).
    2. >20 hours (yeast, in vivo).
    3. >10 hours (Escherichia coli, in vivo).
7. Instability index:
  - The instability index (II) is computed to be 38.80
  - This classifies the protein as stable.
8. Aliphatic index: 50.00

Conclusion of results of BC

The BC peptide is a small molecule with a molecular weight of 4482.15 Da and a favorable charge distribution, possessing only one negatively charged residue and five positively charged residues. This charge profile potentially enhances its ability to penetrate ocular tissues effectively. Its atomic composition ( C201H299N53O56S4) and relatively small total number of atoms (613) contribute to its manageable size for drug formulation. Additionally, BC exhibits strong stability, as indicated by an instability index of 38.80 classifying it as a stable protein, and an estimated half-life ranging from 1.9 hours in mammalian reticulocytes to over 20 hours in yeast and over 10 hours in E. coli, suggesting durability under physiological conditions.

Given these properties, BC’s stability and solubility profiles make it a strong candidate for ocular drug formulation. The extinction coefficient values indicate reliable detection and quantification potential, while the moderate aliphatic index of 50.00 suggests reasonable thermal stability. Together, these parameters highlight BC's suitability for recombinant expression and therapeutic use, offering promising penetration capabilities, structural stability, and ease of formulation for effective ocular drug delivery.

FT
1. Molecular weight: 9001.35
2. Total number of negatively charged residues (Asp + Glu): 6
3. Total number of positively charged residues (Arg + Lys): 12
4. Atomic composition:
  1. Formula: C₃₇₆H₅₉₇N₁₂₃O₁₁₁S₁₂
  2. Total number of atoms: 1219
5. Extinction coefficients ((M^-1cm^-1), at 280 nm measured in water):
  - Ext. coefficient: 10720, assuming all pairs of Cystine residues form cystines
  - Ext. coefficient: 9970, assuming all Cystine residues are reduced
6. Estimated half-life:
  1. The N-terminal of the sequence considered is S (Ser).
  2. The estimated half-life is:
    1. 1.9 hours (mammalian reticulocytes, in vitro).
    2. >20 hours (yeast, in vivo).
    3. >10 hours (Escherichia coli, in vivo).
7. Instability index:
  - The instability index (II) is computed to be 34.75
  - This classifies the protein as stable.
8. Aliphatic index: 55.85

Conclusion of results of FT

The FT peptide has a moderate molecular weight of 9001.35 Da and exhibits a higher net positive charge, with six negatively charged and twelve positively charged residues. This charge distribution is advantageous as the increased positive charge can enhance membrane penetration by interacting with the negatively charged components of cellular membranes. With an atomic formula of C376H597N123O111S12 and a total of 1219 atoms, FT maintains a manageable size for drug delivery applications. The peptide demonstrates stability, as indicated by an instability index of 34.75, classifying it as stable, and an estimated half-life exceeding 1.9 hours in mammalian reticulocytes, with longer longevity in yeast and E. coli systems.

These properties collectively support FT’s candidacy as a viable ocular drug molecule. Its favorable charge profile suggests efficient cellular uptake, while predicted stability and solubility parameters promote robust performance during formulation and use. The extinction coefficients further assist in quantifying FT during manufacturing and quality control. Together, these characteristics underscore FT’s potential for effective membrane penetration, stability, and suitability for therapeutic development targeting ocular diseases.

Step 4: Functionality Predictions

Following stability predictions, evaluating the functionality of our fusion peptides is imperative to ensure they perform effectively within physiological environments. While stability assessments confirm that the peptides maintain their conformations under various conditions, functionality prediction examines their ability to interact efficiently and specifically with target molecules, such as receptors or other proteins. This combined approach transitions from ensuring structural robustness to confirming biological activity, guiding the rational optimization of peptide candidates for enhanced therapeutic potential. Given the complexity of fusing distinct peptide sequences in FT and BC, functionality prediction also helps reveal potential issues like impaired binding or off-target interactions that might not be evident from stability data alone, allowing for early design refinements.

To conduct a comprehensive functionality assessment, we employed a suite of complementary computational tools. DeepKa was used to predict binding affinities and kinetics, providing insights into the potential efficacy of the peptides. CSM-Toxin evaluated the toxicity profile to ensure safety and minimize adverse effects. PMIpred identified potential protein-protein interaction sites, essential for validating target engagement capabilities. Lastly, HDOCK facilitated molecular docking simulations, modeling how the fusion peptides physically interact with their targets. The integration of these predictions furnished a detailed map of functional attributes, which was critical for refining peptide design and increasing the likelihood of successful therapeutic application.

DeepKa

Predicting protein pKa values is essential because these values influence protein structure, function, and interactions within physiological environments. The pKa of ionizable groups affects the protonation state of amino acids, which in turn determines protein charge distribution, stability, binding affinity, and enzymatic activity. Accurate pKa prediction helps elucidate how proteins respond to changes in pH, informing drug design by identifying critical residues involved in binding and catalysis. This understanding is crucial for optimizing therapeutic peptides and proteins, ensuring they maintain desired bioactivity and stability under physiological conditions.

DeepKa is a deep learning-based tool developed to predict protein pKa values accurately by leveraging extensive data from continuous constant-pH molecular dynamics simulations. Unlike traditional empirical methods, DeepKa uses a sophisticated grid charge representation of protein electrostatics and a deep neural network to achieve prediction accuracy comparable to computationally intensive molecular dynamics simulations but with greater speed and efficiency. The tool’s ability to predict pKa values supports related applications, including protein–ligand binding affinity prediction, making it a valuable asset in computational protein engineering and drug discovery. In our project, DeepKa enabled us to identify ionizable residues critical for peptide function and interaction, guiding design improvements that enhance biological performance and therapeutic potential.

Principle of AI Analysis

DeepKa utilizes an advanced deep learning algorithm for predicting protein pKa values, leveraging data obtained from continuous constant-pH molecular dynamics (CpHMD) simulations. Unlike traditional empirical or physics-based approaches, DeepKa employs a grid-based charge representation to model the protein’s electrostatic environment surrounding ionizable residues (Asp, Glu, His, Lys). This grid-based strategy smooths charge distributions and electrostatic energies, resolving discontinuities introduced by cutoff methods prevalent in earlier techniques. By modeling the protein within a defined cubic box or sphere, DeepKa reduces computational demand while maintaining accuracy through focused analysis of electrostatics within this spatial region.

Mathematically, DeepKa correlates the local electrostatic potential ϕ at each residue site with shifts in pKa values ΔpKa, relative to intrinsic reference pKa values. These shifts predominantly arise from desolvation effects and electrostatic interactions within the protein microenvironment. The model employs a trained deep neural network f(ϕ,X;θ) to map these electrostatic features and structural information X to pKa perturbations:

$Team Photo$

pKa_pred = pKa_ref + ΔpKa

this just says: "Final Score = Default Score + Change"
pKa_ref is the standard and reference of pKa values of all the amino acid added
The ΔpKa (Delta pKa) is the "change." It can be positive or negative, meaning the protein's environment can make it easier OR harder for the amino acid to hold onto its proton compared to being in water. The change is mostly based on the pH environment in this case.

ΔpKa = f(φ, X; θ)

f: This is the AI model itself. It's a "function" or a "machine" that takes some inputs and gives you an output (the ΔpKa).
φ (phi): This is the electrostatic map. It's a measure of the electrical forces and "stickiness" surrounding the amino acid inside the protein. It's the most important clue for the AI.
X: This is any other structural information about the protein that might be helpful (like the positions of other atoms).
θ (theta): These are the AI's internal settings. They are like the "knowledge" or "experience" the AI has gained during its training. The AI uses this knowledge to interpret the map (φ) and the structure (X).

In simple terms: The AI (f) uses its trained knowledge (θ) to look at the electrostatic map (φ) and other data (X) and then guesses the "change" in the score (ΔpKa).

Here, θ denotes the network parameters optimized by minimizing the root mean square error (RMSE) loss between predicted pKa values and reference CpHMD-derived pKa values, expressed by:

$Team Photo$

This "Loss Function" is like the AI's exam score. A lower score is better.

pKa_CpHMD: This is the correct answer, obtained from a very accurate but slow simulation (CpHMD).
(pKa_pred - pKa_CpHMD): This is the error for one amino acid—how far the AI's guess was from the right answer.
Σ(...)²: We square all the errors for a bunch of amino acids and add them up. Squaring makes sure positive and negative errors don't cancel out and punishes big mistakes more.
(1/N) * Σ: This finds the average error across all the amino acids in the training set.
√[...]: Finally, we take the square root to get back to a number that makes sense in pKa units. This is the Root Mean Square Error (RMSE).

What this means: During training, the computer constantly tweaks the AI's internal settings (θ) to make this exam score (the RMSE Loss) as small as possible. It's essentially telling the AI: "Keep adjusting your knowledge until your guesses are, on average, as close as possible to the right answers."

This loss function directs model training across large datasets, enabling DeepKa to achieve accuracy comparable to CpHMD simulations while significantly enhancing computational efficiency. The grid-based electrostatic approach further improves robustness by smoothing charge distributions and removing cutoff artifacts. Together, these methodological advancements empower DeepKa to accurately and rapidly predict protein pKa values for high-throughput applications, including protein–ligand binding affinity assessment and protein engineering.

DeepKa Results

BC’s net charge according to pH

FT’s net charge according to pH

At the physiological pH of approximately 7.4 in the ocular environment, the charge properties of fusion peptides BC and FT are critical determinants of their therapeutic efficacy and safety. Both peptides carry a net positive charge at this pH, as their isoelectric points (pI) are well above 7.4—9.8 for BC and 10.48 for FT. This positive charge enhances electrostatic attraction to the predominantly negatively charged ocular surface and cellular membranes, which contain components such as glycosaminoglycans and phospholipids. These interactions facilitate strong, specific binding to cellular receptors, which is essential for efficient drug targeting and uptake.

The BC peptide’s moderate positive charge at pH 7.4 offers a balanced interaction profile: it is sufficient to promote effective receptor binding while minimizing undesired nonspecific interactions, such as chelation with metal ions present in tear fluid. Such controlled binding reduces peptide sequestration and preserves bioavailability. Moreover, the moderate charge supports peptide solubility and stability, reducing risks of aggregation or precipitation that could impair delivery and therapeutic function. In contrast, FT’s higher positive charge may increase binding efficacy but also raises the potential for nonspecific interactions. Therefore, its formulation must carefully balance these properties to maintain safety and stability. These charge-related insights guide the rational design and optimization of BC and FT as ocular drugs, ensuring they achieve targeted delivery, sustained activity, and minimal side effects within the complex biochemical milieu of the eye.

PMIpred

Predicting protein–membrane interactions is critical for designing fusion peptides and therapeutic proteins intended for cellular delivery. Membrane binding is often the first step before processes such as endocytosis, which determines how a peptide or protein enters the cell and exerts its biological effect. Accurately estimating binding strength and interaction sites enables rational improvements in drug design, enhancing uptake, bioavailability, and specificity. For engineered fusion constructs, experimental data on membrane association are often limited, making computational prediction an essential tool for evaluating and optimizing cellular delivery.

PMIpred is a physics-informed prediction method that quantifies protein–membrane interactions by estimating membrane-binding free energies from sequence or structure. Using a transformer neural network trained on over 50,000 peptides, it predicts both global binding affinity and residue-level contributions, distinguishing between nonspecific membrane association and curvature sensing. Results are mapped onto 3D structural models, allowing visualization of interaction regions and guiding mutational design. By combining reliable quantitative predictions with broad applicability across diverse protein types and membrane environments, PMIpred provides a powerful resource for optimizing cellular entry strategies and advancing research on membrane-associated biological processes.

Principle of AI Analysis

PMIpred evaluates protein–membrane interactions by combining machine learning predictions of thermodynamic favorability with structural accessibility calculations. The central principle is that successful endocytosis is triggered when regions of a protein bind strongly and specifically to lipid bilayers. The tool therefore needs to predict where these regions are and how energetically favorable their interactions will be.

The process begins with a sliding window approach. The protein is partitioned into overlapping short segments, allowing local sequence and structural features to be assessed without losing fine detail. For each segment, a neural network predicts the curvature-sensing free energy change:

$Team Photo$

The Core Free Energy Measurement:

This equation calculates the free energy change for a window segment w.

F_w, membrane-bound represents the free energy of the segment when it is bound to the membrane.
F_w, solution represents the free energy of the segment when it is dissolved in solution.
The result, ΔΔF_w, quantifies the energy difference for every window. Hence, that’s why a negative value signifies that binding is energetically favorable.

where negative values indicate that segment w binds favorably to the membrane. Since residues appear in multiple windows, an average ΔΔF is assigned back to each residue:

$Team Photo$

Calculating the Per-Residue Energy

This equation determines the binding affinity for a single amino acid i.

ΔΔF_i: represents the energy difference for every amino acid.
Σ(ΔΔF_w) denotes the sum of the ΔΔF_w scores from every window that contains residue i.
N_i is the total number of windows that contain residue i.
The average is taken to produce a single, consolidated ΔΔF_i value for each residue, creating a detailed energy map across the entire protein.

where W(i) is the set of overlapping windows containing residue i, and Ni is their count. This yields a residue-level energy map.

Membrane composition further refines the prediction. If the bilayer is negatively charged, ΔΔF values are corrected by including electrostatic effects:

$Team Photo$

The model accounts for the chemical properties of the target membrane:

Negatively Charged Membranes: The binding free energy is adjusted to include electrostatic contributions from charged residues.
Here, ΔG_elec,i is the electrostatic correction term for residue i
Neutral Membranes: The original, unadjusted ΔΔF_i values are used directly.

Here, ΔGelec,i adjusts for attraction or repulsion between charged residues and lipid headgroups, while for neutral membranes the unadjusted ΔΔF values (ΔΔFi,L24) are used.

In parallel, PMIpred calculates the solvent-accessible surface area (SASA) of each residue using a probe-based geometric algorithm. A residue is considered sufficiently exposed—and thus available to interact with lipids—if its surface area satisfies:

$Team Photo$

This rule defines the accessibility criterion:

SASA_i is the surface area of residue i that is exposed to the solvent and thus accessible for interaction.
θ (theta) is a predefined accessibility threshold. A residue is considered accessible for membrane binding only if its SASA_i value exceeds this threshold.

Finally, PMIpred produces three types of outputs:

Global membrane-binding free energy:

$Team Photo$

ΔΔF_global: represents the energy difference for every fusion peptide.

Σ(ΔΔF_i) (The Sum): The algorithm takes the individual binding score (ΔΔF_i) for every single accessible residue and adds them all together.

N_acc (The Count): This is the total number of accessible residues — the ones that passed the surface exposure check (SASA_i > θ).

1 / N_acc (The Average): You then divide the total sum by the number of accessible residues. This gives you the average binding score per accessible residue.

$Team Photo$

This classification system uses energy cutoffs (α and β) to categorize accessible residues: (α and β are pre-set, optimized numbers that make the model's final classifications as correct as possible.)

C_i: The category for each amino acid.
B (Binder): Residues with a strongly favorable binding energy (ΔΔF_i is less than α).
S (Sensor): Residues with a moderately favorable binding energy (ΔΔF_i is between α and β).
- (Non-binder): Residues with an unfavorable binding energy (ΔΔF_i is greater than or equal to β).

This classification ensures that strongly favorable regions are marked as binders (B), intermediate ones as curvature sensors (S), and weak or unfavorable sites as non-binders (-).

This value reflects the protein’s overall membrane-binding tendency.

Residue-level classifications: Each residue is tagged as binder (B), sensor (S), or non-binder (-) along with its ΔΔF contribution.
3D structural mapping: The classification and energy scores are projected back onto the protein’s structural model, highlighting binding hot spots on the protein surface.

Hilten, N.; Verwei, N.; Methorst, J; Nase, C.; Bernatavicius, A.; Risselada, H.J., Bioinformatics, 2024, 40(2).

Through this workflow, PMIpred transforms raw sequence and structural properties into a quantitative and spatially resolved fingerprint of protein–membrane interaction. These outputs provide insight into how readily a drug candidate can initiate endocytosis and guide design modifications to enhance uptake and specificity.

PMIpred results

1. FT

Chain	#n	sequence	ΔΔF_adj	-/S/Bf
A	1	CKSGGAWCGFDPHGC	-7.87	S
A	2	KSGGAWCGFDPHGCC	-6.47	S
A	3	SGGAWCGFDPHGCCG	-6.55	S
A	4	GGAWCGFDPHGCCGN	-6.98	S
A	5	GAWCGFDPHGCCGNC	-5.43	-
A	6	AWCGFDPHGCCGNCG	-5.95	-
A	7	WCFDPHGCCGNCGC	-7.08	S
A	8	CGFDPHGCCGNCGCL	-5.67	-
A	9	GFDPHGCCGNCGCLV	-5.58	-
A	10	FDPHGCCGNCGLVG	-6.33	S
A	11	DPHGCCGNCGLVGF	-6.25	S
A	12	PHGCCGNCGLVGFC	-10.09	B
A	13	HGCCGNCGLVGFYCY	-9.74	S
A	14	GCCGNCGLVGFYCYG	-8.93	S
A	15	CCGNCGLVGFYCYGT	-8.78	S
A	16	CGNCGLVGFYCYGTG	-8.45	-
A	17	GNCGLVGFYCYGTGC	-9.66	S
A	18	NCGLVGFYCYGTGCC	-8.74	S
A	19	CGCLVGFYCYGTGCCY	-10.17	B
A	20	GCLVGFYCYGTGCCYG	-9.31	S
A	21	CLVGFYCYGTGCCYGR	-11.82	B
A	22	LVGFYCYGTGCCYGRK	-10.08	B
A	23	VGFYCYGTGCCYGRKC	-9.77	S
A	24	GFYCYGTGCCYGRKCD	-6.96	S
A	25	FCYGTGCCYGRKCDE	-4.17	-
A	26	CYGTGCCYGRKCDEV	-2.97	-
A	27	YGTGCCYGRKCDEVD	-1.61	-
A	28	GTGCCYGRKCDEVDS	-1.31	-
A	29	TGCCYGRKCDEVDSQ	-1.53	-
A	30	GCCYGRKCDEVDSQP	-1.62	-
A	31	CCYGRKCDEVDSQPE	-0.43	-
A	32	CYGRKCDEVDSQPET	-0.99	-
A	33	YGRKCDEVDSQPETR	-1.44	-
A	34	GRKCDEVDSQPETRT	-0.71	-
A	35	RKCDEVDSQPETRTG	-1.05	-
A	36	KCDEVDSQPETRTGD	0.34	-
A	37	CDEVDSQPETRTGDD	2.82	-
A	38	DEVDSQPETRTGDDD	3.72	-
A	39	EVDSQPETRTGDDDP	2.80	-
A	40	VDSQPETRTGDDDPH	1.72	-
A	41	DSQPETRTGDDDPHR	0.97	-
A	42	SQPETRTGDDDPHRL	-0.23	-
A	43	QPETRTGDDDPHRLL	-0.23	-
A	44	PETRTGDDDPHRLLQ	-0.07	-
A	45	ETRTGDDDPHRLLQQ	-0.45	-
A	46	TRTGDDDPHRLLQQL	-2.73	-
A	47	RTGDDDPHRLLQQLV	-5.10	-
A	48	TGDDDPHRLLQQLVL	-3.55	-
A	49	GDDDPHRLLQQLVLS	-2.97	-
A	50	DDDPHRLLQQLVLSG	-4.60	-
A	51	DDPHRLLQQLVLSGN	-6.41	S
A	52	DPHRLLQQLVLSGNL	-6.96	S
A	53	PHRLLQQLVLSGNLI	-11.70	B
A	54	HRLLQQLVLSGNLIL	-12.66	B
A	55	RLLQQLVLSGNL IKE	-9.03	S
A	56	LLQQLVLSGNL IKEA	-7.49	S
A	57	LQQLVLSGNL IKEAV	-8.84	S
A	58	QQLVLSGNL IKEAVR	-5.38	-
A	59	QLVLSGNL IKEAVRR	-6.69	S
A	60	LVLSGNL IKEAVRRL	-12.18	B
A	61	VLSGNL IKEAVRRLH	-11.60	B
A	62	LSGNLIKEAVRRLHS	-12.81	B
A	63	SGNLIKEAVRRLHSR	-8.57	S
A	64	GNLIKEAVRRLHSRR	-13.16	B
A	65	NLIKEAVRRLHSRRL	-9.32	S
A	66	LIKEAVRRLHSRRLQ	-9.53	S
A	67	IKEA VRRLHSRRLQD	-6.51	S
A	68	KEAVRRLHSRRLQDE	-4.27	-
A	69	EAVRRLHSRRLQDEV	-3.63	-
A	70	AVRRLHSRRLQDEVD	-3.73	-
A	71	VRRLHSRRLQDEVDP	-4.17	-
A	72	RRLHSRRLQDEVDPR	-4.60	-
A	73	RLHSRRLQDEVDPRC	-4.06	-
A	74	LHSRRLQDEVDPRCG	-3.34	-
A	75	HSRRLQDEVDPRCGV	-3.18	-
A	76	SRRLQDEVDPRCGVP	-2.98	-
A	77	RRLQDEVDPRCGVPD	-1.77	-
A	78	RLQDEVDPRCGVPDK	-1.64	-
A	79	LQDEVDPRCGVPDKE	0.24	-
A	80	QDEVDPRCGVPDKET	1.11	-
A	81	DEV D PRCGVPDKETW	0.36	-
A	82	EVDPRCGVPDKETWW	-3.61	-
A	83	VDPRCGVPDKETWWE	-3.56	-
A	84	DPRCGVPDKETWWET	-3.85	-
A	85	PRCGVPDKETWWETW	-6.17	-
A	86	RCGVPDKETWWETWW	-8.83	S
A	87	CGVPDKETWWETWWT	-7.15	S
A	88	GVPDKETWWETWWTE	-5.03	-
A	89	VPDKETWWETWWTEW	-6.34	-
A	90	PDKETWWETWWTEWS	-7.25	S
A	91	DKETWWETWWTEWSQ	-4.87	-
A	92	KETWWETWWTEWSQP	-11.45	B
A	93	ETWWETWWTEWSQPK	-9.98	S
A	94	TWWETWWTEWSQPKK	-11.10	B
A	95	WWETWWTEWSQPKKK	-14.36	B
A	96	WETWWTEWSQPKKKR	-12.90	B
A	97	ETWWTEWSQPKKKRK	-9.05	S
A	98	TWWTEWSQPKKKRKV	-9.92	S

Chain	#n	AA	SASA	ΔΔF	-/S/B
A	1	C	0.66	-7.87	S	.	-
A	2	K	0.59	-7.17	S	.	-
A	3	S	0.72	-6.96	S	.	-
A	4	G	0.66	-6.97	S	.	-
A	5	G	0.60	-6.66	S	.	-
A	6	A	0.69	-6.54	S	.	-
A	7	W	0.67	-6.62	S	.	-
A	8	C	0.71	-6.50	S	.	-
A	9	G	0.87	-6.40	-	A	-
A	10	F	0.83	-6.39	-	A	-
A	11	D	0.82	-6.38	-	A	-
A	12	P	0.69	-6.69	S	.	-
A	13	H	0.75	-6.92	S	.	-
A	14	G	0.87	-7.07	S	A	S
A	15	C	0.65	-7.18	S	.	-
A	16	C	0.61	-7.22	S	.	-
A	17	G	0.56	-7.43	S	.	-
A	18	N	0.51	-7.58	S	.	-
A	19	C	0.64	-7.79	S	.	-
A	20	G	0.63	-8.05	S	.	-
A	21	C	0.69	-8.44	S	.	-
A	22	L	0.61	-8.64	S	.	-
A	23	V	0.64	-8.91	S	.	-
A	24	G	0.66	-9.01	S	.	-
A	25	F	0.75	-8.86	S	.	-
A	26	C	0.76	-8.64	S	.	-
A	27	Y	0.64	-8.08	S	.	-
A	28	G	0.54	-7.52	S	.	-
A	29	T	0.61	-7.02	S	.	-
A	30	G	0.56	-6.54	S	.	-
A	31	C	0.84	-6.01	-	A	-
A	32	C	0.85	-5.43	-	A	-
A	33	Y	0.89	-4.95	-	A	-
A	34	G	0.93	-4.32	-	A	-
A	35	R	1.06	-3.76	-	A	-
A	36	K	1.15	-2.95	-	A	-
A	37	C	1.29	-2.09	-	A	-
A	38	D	1.28	-1.19	-	A	-
A	39	E	1.42	-0.54	-	A	-
A	40	V	1.26	-0.15	-	A	-
A	41	D	1.26	0.11	-	A	-
A	42	S	1.31	0.20	-	A	-
A	43	Q	1.39	0.28	-	A	-
A	44	P	1.35	0.37	-	A	-
A	45	E	1.30	0.45	-	A	-
A	46	T	1.31	0.30	-	A	-
A	47	R	1.36	0.02	-	A	-
A	48	T	1.26	-0.12	-	A	-
A	49	G	1.25	-0.27	-	A	-
A	50	D	1.21	-0.50	-	A	-
A	51	D	1.27	-0.95	-	A	-
A	52	D	1.08	-1.61	-	A	-
A	53	P	1.04	-2.63	-	A	-
A	54	H	1.12	-3.67	-	A	-
A	55	R	1.07	-4.38	-	A	-
A	56	L	0.98	-4.95	-	A	-
A	57	L	0.99	-5.52	-	A	-
A	58	Q	1.06	-5.86	-	A	-
A	59	Q	0.99	-6.30	-	A	-
A	60	L	0.87	-7.09	S	A	S
A	61	V	0.89	-7.68	S	A	S
A	62	L	0.86	-8.19	S	A	S
A	63	S	0.88	-8.53	S	A	S
A	64	G	0.91	-9.21	S	A	S
A	65	N	0.92	-9.52	S	A	S
A	66	L	0.87	-9.73	S	A	S
A	67	I	0.79	-9.70	S	-	-
A	68	K	0.87	-9.20	S	A	S
A	69	E	0.92	-8.60	S	A	S
A	70	A	0.98	-8.25	S	A	S
A	71	V	1.03	-8.03	S	A	S
A	72	R	0.93	-7.74	S	A	S
A	73	R	0.96	-7.65	S	A	S
A	74	L	1.13	-7.43	S	A	S
A	75	H	1.19	-6.83	S	A	S
A	76	S	1.22	-6.26	-	A	-
A	77	R	1.20	-5.52	-	A	-
A	78	R	1.26	-5.06	-	A	-
A	79	L	1.25	-4.17	-	A	-
A	80	Q	1.23	-3.47	-	A	-
A	81	D	1.27	-2.81	-	A	-
A	82	E	1.21	-2.62	-	A	-
A	83	V	1.09	-2.57	-	A	-
A	84	D	1.03	-2.59	-	A	-
A	85	P	1.03	-2.75	-	A	-
A	86	R	1.00	-3.06	-	A	-
A	87	C	0.91	-3.23	-	A	-
A	88	G	0.98	-3.29	-	A	-
A	89	V	1.01	-3.49	-	A	-
A	90	P	0.99	-3.77	-	A	-
A	91	D	1.04	-3.89	-	A	-
A	92	K	1.09	-4.54	-	A	-
A	93	E	1.15	-5.09	-	A	-
A	94	T	1.11	-5.85	-	A	-
A	95	W	1.17	-6.88	S	A	S
A	96	W	1.23	-7.76	S	A	S
A	97	E	1.13	-8.13	S	A	S
A	98	T	1.16	-8.55	S	A	S
A	99	W	1.27	-8.89	S	A	S
A	100	W	1.15	-9.10	S	A	S
A	101	T	1.16	-9.12	S	A	S
A	102	E	1.17	-9.30	S	A	S
A	103	W	1.27	-9.72	S	A	S
A	104	S	1.33	-10.10	B	A	B
A	105	Q	1.40	-10.46	B	A	B
A	106	P	1.56	-11.25	B	A	B
A	107	K	1.61	-11.22	B	A	B
A	108	K	1.65	-11.47	B	A	B
A	109	K	1.78	-11.56	B	A	B
A	110	R	1.85	-10.63	B	A	B
A	111	K	1.99	-9.49	S	A	S
A	112	V	2.01	-9.92	S	A	S

Summary of FT

The analysis shows that FT possesses strong potential for endocytic uptake, driven by distinct regions of stability across its sequence. The central blocks (residues ~12–24 and 53–66) form stable clusters dominated by S and B classifications, creating a well-folded structural core. The C-terminal stretch (residues 92–110) further reinforces this stability, with consecutive strong B residues and ΔΔF_adj values below –10, marking it as a major binding hotspot. These stable zones are characterized by low SASA values, consistent with a compact, protected core that confers resilience under endosomal stress. In contrast, the N-terminus and the long internal span between residues 25–50 exhibit weaker classifications, reflecting greater flexibility. This balance of rigid, stable cores and strategically placed flexible segments suggests that FT can maintain structural integrity while retaining the adaptability needed for productive membrane engagement and intracellular function.

2. BC

Chain	#n	sequence	ΔΔF_adj	- / S / B
A	1	HHHHHHENLYFQGKE	-2.99	-
A	2	HHHHHENLYFQGKET	-3.73	-
A	3	HHHHENLYFQGKETW	-6.76	S
A	4	HHHENLYFQGKETWW	-9.90	S
A	5	HHENLYFQGKETWWE	-9.87	S
A	6	HENLYFQGKETWWET	-9.01	S
A	7	ENLYFQGKETWWETW	-10.11	B
A	8	NLYFQGKETWWETWW	-14.16	B
A	9	LYFQGKETWWETWWT	-13.84	B
A	10	YFQGKETWWETWWTE	-8.50	S
A	11	FQGKETWWETWWTEW	-12.48	B
A	12	QGKETWWETWWTEWS	-8.63	S
A	13	GKETWWETWWTEWSQ	-7.36	S
A	14	KETWWETWWTEWSQP	-11.45	B
A	15	ETWWETWWTEWSQPK	-9.98	S
A	16	TWWETWWTEWSQPKK	-11.10	B
A	17	WWETWWTEWSQPKKK	-14.36	B
A	18	WETWWTEWSQPKKKR	-12.90	B
A	19	ETWWTEWSQPKKKRK	-9.05	S
A	20	TWWTEWSQPKKKRKV	-9.92	S
A	21	WWTEWSQPKKKRKVP	-14.50	B
A	22	WTEWSQPKKKRK VPR	-9.24	S
A	23	TEWSQPKKKRKVPRC	-6.92	S
A	24	EWSQPKKKRKVPRCG	-7.74	S
A	25	WSQPKKKRKVPRCGV	-11.05	B
A	26	SQPKKKRKVPRCGVP	-8.83	S
A	27	QPKKKRKVPRCGVPD	-8.36	S
A	28	PKKKRVPRCGVPDH	-8.82	S
A	29	KKKRKVPRCGVPDHA	-8.33	S
A	30	KKRKVPRCGVPDHAW	-8.30	S
A	31	KRKVPRCGVPDHAWT	-9.44	S
A	32	RKVPRCGVPDHAWTL	-11.78	B
A	33	KVPRCGVPDHAWTLK	-10.88	B
A	34	VPRCGVPDHAWTLKQ	-9.60	S
A	35	PRCGVPDHAWTLKQI	-10.75	B
A	36	RCGVPDHAWTLKQIA	-9.09	S
A	37	CGVPDHAWTLKQIAK	-9.86	S
A	38	GVPHAWTLKQIAKL	-9.49	S
A	39	VPDHAWTLKQIAKLF	-11.37	B
A	40	PDHAWTLKQIAKLK	-13.05	B
A	41	DHAWTLKQIAKLFKP	-14.51	B
A	42	HAWTLKQIAKLFKPR	-13.67	B
A	43	AWTLKQIAKLFKPRC	-17.52	B
A	44	WTLKQIAKLFKPRCG	-14.84	B
A	45	TLKQIAKLFKPRCGV	-15.97	B
A	46	LKQIAKLFKPRCGVP	-15.45	B
A	47	KQIAKLFKPRCGVPD	-12.68	B
A	48	QIAKLFKPRCGVPDS	-11.21	B
A	49	IAKLFKPRCGVPDSC	-10.22	B
A	50	AKLFKPRCGVPDSCT	-10.74	B
A	51	KLFKPRCGVPDSCTG	-8.31	S
A	52	LFKPRCGVPDSCTGT	-8.81	S
A	53	FKPRCGVPDSCTGTS	-6.46	S
A	54	KPRCGVPDSCTGTSS	-4.69	-
A	55	PRCGVPDSCTGTSSD	-2.24	-
A	56	RCGVPD S CTGTSSDV	-2.13	-
A	57	CGVPDSCTGTSSDVG	-1.63	-
A	58	GVPDSCTGTSSDVG G	-1.25	-
A	59	VPDSCTGTSSDVG GY	-0.97	-
A	60	PDSCTGTSSDVG GYN	-0.71	-
A	61	DSCTGTSSDVG GYNY	-0.56	-
A	62	SCTGTSSDVG GYNYV	-2.55	-
A	63	CTGTSSDVG GYNYVS	-2.42	-
A	64	TGTSSDVG GYNYVSW	-3.25	-
A	65	GTSSDVG GYNYV SWY	-6.25	-
A	66	TSSDVG GYNYV SWYQ	-6.70	-

Chain	#n	AA	SASA	ΔΔF	-/S/B
A	1	H	1.63	-2.99	-	A	-
A	2	H	1.60	-3.36	-	A	-
A	3	H	1.53	-4.50	-	A	-
A	4	H	1.48	-5.85	-	A	-
A	5	H	1.45	-6.65	-	A	-
A	6	H	1.36	-7.04	-	A	S
A	7	E	1.32	-7.48	S	A	S
A	8	N	1.28	-8.32	S	A	S
A	9	L	1.16	-8.93	S	A	S
A	10	Y	1.13	-8.89	S	A	S
A	11	F	1.07	-9.21	S	A	S
A	12	Q	1.03	-9.16	S	A	S
A	13	G	1.09	-9.03	S	A	S
A	14	K	1.08	-9.20	S	A	S
A	15	E	1.03	-9.25	S	A	S
A	16	T	0.97	-9.79	S	A	S
A	17	W	1.00	-10.50	B	A	B
A	18	W	1.12	-10.91	B	A	B
A	19	E	1.09	-10.85	B	A	B
A	20	T	1.12	-10.86	B	A	B
A	21	W	1.21	-11.22	B	A	B
A	22	W	1.09	-11.16	B	A	B
A	23	T	1.09	-10.68	B	A	B
A	24	E	1.11	-10.28	B	A	B
A	25	W	1.20	-10.45	B	A	B
A	26	S	1.24	-10.20	B	A	B
A	27	Q	1.29	-10.19	B	A	B
A	28	P	1.45	-10.28	B	A	B
A	29	K	1.51	-10.07	B	A	B
A	30	K	1.45	-9.96	S	A	S
A	31	K	1.50	-9.85	S	A	S
A	32	R	1.62	-9.68	S	A	S
A	33	K	1.61	-9.54	S	A	S
A	34	V	1.51	-9.58	S	A	S
A	35	P	1.43	-9.64	S	A	S
A	36	R	1.31	-9.28	S	A	S
A	37	C	1.22	-9.32	S	A	S
A	38	G	1.16	-9.49	S	A	S
A	39	V	1.09	-9.73	S	A	S
A	40	P	1.19	-9.86	S	A	S
A	41	D	1.01	-10.24	B	A	B
A	42	H	1.04	-10.60	B	A	B
A	43	A	1.14	-11.18	B	A	B
A	44	W	1.12	-11.61	B	A	B
A	45	T	1.13	-12.12	B	A	B
A	46	L	1.06	-12.52	B	A	B
A	47	K	1.10	-12.58	B	A	B
A	48	Q	1.21	-12.60	B	A	B
A	49	I	1.20	-12.65	B	A	B
A	50	A	1.29	-12.64	B	A	B
A	51	K	1.26	-12.59	B	A	B
A	52	L	1.34	-12.52	B	A	B
A	53	F	1.35	-12.32	B	A	B
A	54	K	1.33	-11.88	B	A	B
A	55	P	1.37	-11.16	B	A	B
A	56	R	1.30	-10.33	B	A	B
A	57	C	1.29	-9.53	S	A	S
A	58	G	1.19	-8.44	S	A	S
A	59	V	1.16	-7.52	S	A	S
A	60	P	1.16	-6.50	S	A	S
A	61	D	0.99	-5.51	-	A	-
A	62	S	0.98	-4.83	-	A	-
A	63	C	1.01	-4.25	-	A	-
A	64	T	0.98	-3.78	-	A	-
A	65	G	1.01	-3.48	-	A	-
A	66	T	0.97	-3.38	-	A	-
A	67	S	0.94	-2.99	-	A	-
A	68	S	0.86	-2.72	-	A	-
A	69	D	0.97	-2.56	-	A	-
A	70	V	1.02	-2.58	-	A	-
A	71	G	1.08	-2.63	-	A	-
A	72	G	1.10	-2.74	-	A	-
A	73	Y	1.09	-2.93	-	A	-
A	74	N	1.19	-3.21	-	A	-
A	75	Y	1.29	-3.62	-	A	-
A	76	V	1.47	-4.23	-	A	-
A	77	S	1.59	-4.66	-	A	-
A	78	W	1.54	-5.40	-	A	-
A	79	Y	1.62	-6.48	S	A	S
A	80	Q	1.63	-6.70	S	A	S

Summary of BC

The stability analysis of BC reveals a strong overall tendency for endocytic absorption and target engagement, with the N-terminal region showing a dense stretch of S and B classifications that indicate a robust, well-folded state. A central segment of the sequence, enriched in Trp and Phe residues, displays consistently favorable ΔΔF_adj values (< –10) and is dominated by B classifications, marking it as a key membrane-binding hotspot. These regions are flanked by S residues, creating extended patches of curvature-sensitive and binding-prone sites. Toward the C-terminal, the classifications gradually weaken, with fewer strong binders and an eventual decline into non-binders, reflecting a stabilization–interaction gradient across the molecule. Together, this profile suggests that BC combines high structural integrity with strategically positioned binding hotspots, supporting efficient endocytosis, strong receptor engagement, and enhanced therapeutic potential.

AllerCatPro 2.0

In developing therapeutic proteins and peptides, it is crucial to assess their potential to trigger allergic reactions or adverse immune responses. Such immunogenicity can compromise drug safety, reduce efficacy, and lead to undesired side effects that hinder clinical success. Early identification of allergenic regions within a drug candidate is therefore essential to avoid costly setbacks during development and to ensure patient safety. This functional aspect—predicting how a protein may be recognized by the immune system—is a key determinant of a drug’s viability and successful translation from design to therapy. Addressing allergenicity helps maintain the therapeutic’s functional integrity by preventing immune-mediated neutralization or hypersensitivity reactions.

AllerCatPro 2.0 is a cutting-edge computational tool specifically designed for predicting protein allergenicity as a critical component of functional assessment. Unlike tools focused on structural stability or folding efficiency, AllerCatPro 2.0 evaluates whether protein sequences and their three-dimensional conformations resemble known allergens, thereby estimating the likelihood of eliciting an immune response. It achieves this by integrating sequence motif analysis, structural similarity comparisons, and machine learning models that recognize subtle patterns associated with allergenic potential. By providing a detailed allergenicity profile and confidence scores, AllerCatPro 2.0 enables the identification and rational redesign of potentially problematic regions, ensuring that therapeutic candidates retain their intended biological function without compromising safety. This makes it an indispensable functionality prediction tool in drug development pipelines.

Principle of AI Analysis

AllerCatPro 2.0 predicts protein allergenicity by comparing the query sequence against a comprehensive dataset of known allergens using multiple sequential steps. First, it checks for gluten-like glutamine repeats, which serve as an independent allergen indicator but only lead to a strong allergenicity prediction if other similarities are present. Next, it performs a BLASTP search against a curated 3D structure database of 714 known allergens. If there is significant sequence similarity (E-value < 0.001), it evaluates the 3D surface epitope similarity to assign strong evidence if identity exceeds 92–93%, or weak evidence otherwise. If no structural matches are found, the tool uses a linear-window rule requiring 35% identity over 80 amino acids and, failing that, a hexamer hit approach requiring three hexamer matches to known allergens. If none of these tests succeed, the protein is predicted as having no evidence for allergenicity.

Decision workflow of AllerCatPro 2.0 from the query protein to the results of either strong, weak or no evidence for allergenic potential. AllerCatPro 2.0 checks the similarity of the query protein with 714 representatives in our 3D model/structure database of known allergens as well as the most comprehensive dataset of reliable proteins associated with allergenicity (4979 protein allergens). In addition to only comparing the similarity of the query protein with the dataset of known allergens in AllerCatPro 1.7, AllerCatPro 2.0 now predicts the similarity of the query sequence to datasets of 165 autoimmune allergens and 162 low allergenic proteins separately. If a significant sequence similarity is found, then AllerCatPro 2.0 identifies hits of similar proteins associated with autoimmune diseases and/or similar proteins of low allergenic potential and presents the sequence identity to the closest hit.

In addition to known allergens, AllerCatPro 2.0 separately checks for similarity to autoimmune allergens and low allergenic proteins, offering a nuanced assessment of functional immune risk. It assigns predictions of strong, weak, or no evidence for allergenicity accompanied by detailed similarity scores and comments clarifying the basis of the prediction. Compared to previous methods, AllerCatPro 2.0’s integration of 3D structural similarity significantly improves prediction accuracy and reduces false positives. This hierarchical workflow prioritizes the most biologically relevant information to deliver reliable allergenicity assessments vital for ensuring the safety and functional viability of therapeutic proteins.

AllerCatPro 2.0 Results

Result of FT

Result of BC

Both FT and BC show no evidence of allergenicity according to AllerCatPro 2.0 predictions. This result is highly advantageous for their therapeutic use, as it indicates a low likelihood of triggering adverse immune or allergic reactions in patients. The absence of allergenic motifs or structural resemblance to known allergens supports a safer clinical profile, reducing risks related to immunogenicity that can compromise drug efficacy and patient safety.

Having no allergenic potential also facilitates smoother regulatory approval and broader applicability across diverse patient populations. It ensures that the designed protein drugs maintain their intended biological functions without unwanted immune system activation, thereby improving their overall functional viability and therapeutic success.

HDock (Docking and Confidence Score)

HDOCK is an advanced web server that facilitates molecular docking for protein-protein and protein-DNA/RNA interactions using a hybrid algorithm combining template-based modeling and free docking. It accepts both protein sequences and structures as input, making it accessible even when experimental structural data are unavailable. HDOCK efficiently integrates sequence similarity search, template selection, and docking simulations, providing rapid and accurate predictions of binding modes within about 10 to 20 minutes. The server leverages binding information from homologous complexes to improve the accuracy of docking results and supports flexible constraints such as user-provided binding site residues. Its versatility and computational efficiency have been validated on multiple benchmark datasets, demonstrating superior performance in predicting biologically relevant interactions compared to traditional methods.

The mathematical model

The most common method of checking binding confidence is by using AI prediction systems which find the best combination of protein-protein docking and calculates a confidence score using the following formula.

$Team Photo$

Edocking is the docking score of protein-protein complexes in PDB, which is usually around -200 or better. Roughly, when the confidence score is above 0.7, the two molecules are likely to bind; when the score is between 0.5 and 0.7, the molecules are likely to bind; when the confidence score is below 0.5, the molecules are unlikely to bind completely. Nevertheless, the confidence score should be used cautiously due to its empirical nature.

References for HDOCK (as per the server website hdock.phys.hust.edu.cn):

Yan Y, Tao H, He J, Huang S-Y.* The HDOCK server for integrated protein-protein docking. Nature Protocols, 2020;
Yan Y, Zhang D, Zhou P, Li B, Huang S-Y. HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy. Nucleic Acids Res. 2017;45(W1):W365-W373.
Yan Y, Wen Z, Wang X, Huang S-Y. Addressing recent docking challenges: A hybrid strategy to integrate template-based and free protein-protein docking. Proteins 2017;85:497-512.
Huang S-Y, Zou X. A knowledge-based scoring function for protein-RNA interactions derived from a statistical mechanics-based iterative method. Nucleic Acids Res. 2014;42:e55.
Huang S-Y, Zou X. An iterative knowledge-based scoring function for protein-protein recognition. Proteins 2008;72:557-579.

Docking results

Fig 1 TP1 binds with NOS

Fig 2 TP1 binds with AKT1

Fig 3 TP1 binds with PI3K

Fig 4 FRATtide binds with GSK3

Fig 5 TP1 CDR1 binds with ANP32a

Fig 6 BDNF binds with TrkB

Fig 7 FT binds with NOS

Fig 8 FT binds AKT1

Fig 9 FT binds with PI3K

Fig 10 FT binds with GSK3

Fig 11 BC binds with ANP32a/p>

Fig 12 FT binds with PI3K

Fig 13 BC binds with TrkB

The docking results showed docking scores lower than -200, indicating a strong interaction between the molecules. Additionally, the confidence score was approximately 0.7, reflecting a high reliability of the predicted binding mode. The combination of a low docking score and a high confidence score suggests that the binding affinity between the molecules is favorable, implying that the ligand-protein complex formed is stable and biologically relevant. This strong predicted affinity is critical for drug development, as it correlates with effective target engagement and potential therapeutic efficacy.

ClusPro (Docking and Confidence Score)

The ClusPro server is a widely used tool for protein-protein docking. The server provides a simple home page for basic use, requiring only two files in Protein Data Bank format. However, ClusPro also offers a number of advanced options to modify the search that include the removal of unstructured protein regions, applying attraction or repulsion, accounting for pairwise distance restraints, constructing homo-multimers, considering small angle X-ray scattering (SAXS) data, and finding heparin binding sites. Six different energy functions can be used depending on the type of proteins. Docking with each energy parameter set results in ten models defined by centers of highly populated clusters of low energy docked structures. This protocol describes the use of the various options, the construction of auxiliary restraints files, the selection of the energy parameters, and the analysis of the results. Although the server is heavily used, runs are generally completed in < 4 hours.

The mathematical model

ClusPro calculates interaction energies between the docked protein molecules by the following formula.

The weighted score is a measure of the quality of a docked complex, with lower scores indicating better quality. The ClusPro score is not a direct measure of binding affinity, though lower scores generally correlate with tighter binding.

References for ClusPro:

Jones G, Jindal A, Ghani U, Kotelnikov S, Egbert M, Hashemi N, Vajda S, Padhorny D, Kozakov D. Elucidation of protein function using computational docking and hotspot analysis by ClusPro and FTMap.Acta Crystallogr D Struct Biol. 2022 Jun 1;78(Pt 6):690-697.
Desta IT, Porter KA, Xia B, Kozakov D, Vajda S. Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure. 2020 Sep; 28 (9):1071-1081.
Vajda S, Yueh C, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, Kozakov D. New additions to the ClusPro server motivated by CAPRI. Proteins: Structure, Function, and Bioinformatics. 2017 Mar; 85(3):435-444.
Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, Beglov D, Vajda S. The ClusPro web server for protein-protein docking. Nature Protocols. 2017 Feb;12(2):255-278.
Kozakov D, Beglov D, Bohnuud T, Mottarella S, Xia B, Hall DR, Vajda, S. How good is automated protein docking? Proteins: Structure, Function, and Bioinformatics. 2013 Dec; 81(12):2159-66.

Docking results

BC binds with ANP32A:

Cluster	Members	Representative	Weighted Score
0	57	Center	-942.5
		Lowest Energy	-1055.8
1	44	Center	-1068.6
		Lowest Energy	-1117.9
2	30	Center	-1022.4
		Lowest Energy	-1030.8
3	28	Center	-974.9
		Lowest Energy	-1060.7
4	28	Center	-990.7
		Lowest Energy	-1056.6
5	27	Center	-1053.1
		Lowest Energy	-1173.2
6	26	Center	-979.3
		Lowest Energy	-1186.8
7	21	Center	-1046.9
		Lowest Energy	-1177.8
8	21	Center	-989.1
		Lowest Energy	-1085.8
9	20	Center	-935.7
		Lowest Energy	-984.7
10	20	Center	-927.6
		Lowest Energy	-1127.6
11	20	Center	-1024.8
		Lowest Energy	-1055.5
12	19	Center	-965.6
		Lowest Energy	-1008.9
13	18	Center	-960.0
		Lowest Energy	-993.7
14	17	Center	-945.8
		Lowest Energy	-1090.1
15	16	Center	-1126.4
		Lowest Energy	-1126.4
16	16	Center	-1077.1
		Lowest Energy	-1077.1
17	15	Center	-947.7
		Lowest Energy	-1060.3
18	15	Center	-958.0
		Lowest Energy	-985.1
19	15	Center	-1122.9
		Lowest Energy	-1182.8
20	13	Center	-929.2
		Lowest Energy	-965.3
21	13	Center	-1083.8
		Lowest Energy	-1100.8
22	13	Center	-1053.4
		Lowest Energy	-1053.4
23	12	Center	-989.2
		Lowest Energy	-992.7
24	12	Center	-1019.2
		Lowest Energy	-1019.2
25	12	Center	-1008.5
		Lowest Energy	-1008.5
26	11	Center	-1026.6
		Lowest Energy	-1033.6
27	11	Center	-994.8
		Lowest Energy	-1075.7
28	11	Center	-987.2
		Lowest Energy	-987.2
29	10	Center	-944.2
		Lowest Energy	-1048.1

BC binds with TrkB:

Cluster	Members	Representative	Weighted Score
0	115	Center	-1354.8
		Lowest Energy	-1421.0
1	54	Center	-1272.9
		Lowest Energy	-1439.4
2	35	Center	-1385.9
		Lowest Energy	-1485.2
3	29	Center	-1253.6
		Lowest Energy	-1366.0
4	23	Center	-1468.0
		Lowest Energy	-1528.7
5	22	Center	-1343.5
		Lowest Energy	-1469.6
6	21	Center	-1378.2
		Lowest Energy	-1378.2
7	21	Center	-1289.2
		Lowest Energy	-1407.6
8	18	Center	-1378.2
		Lowest Energy	-1439.0
9	16	Center	-1330.5
		Lowest Energy	-1473.7
10	13	Center	-1339.8
		Lowest Energy	-1599.3
11	13	Center	-1297.3
		Lowest Energy	-1405.2
12	13	Center	-1284.6
		Lowest Energy	-1512.4
13	11	Center	-1281.8
		Lowest Energy	-1281.8
14	10	Center	-1420.9
		Lowest Energy	-1420.9
15	10	Center	-1386.9
		Lowest Energy	-1386.9
16	10	Center	-1385.9
		Lowest Energy	-1385.9
17	9	Center	-1213.2
		Lowest Energy	-1349.6
18	9	Center	-1270.1
		Lowest Energy	-1348.4
19	5	Center	-1215.5
		Lowest Energy	-1332.4
20	5	Center	-1232.7
		Lowest Energy	-1305.0
21	5	Center	-1221.0
		Lowest Energy	-1291.5
22	4	Center	-1236.1
		Lowest Energy	-1236.1
23	4	Center	-1218.5
		Lowest Energy	-1230.4
24	2	Center	-1249.6
		Lowest Energy	-1264.4

FT binds with NOS:

Cluster	Members	Representative	Weighted Score
0	72	Center	-1250.9
		Lowest Energy	-1541.4
1	61	Center	-1400.7
		Lowest Energy	-1507.7
2	40	Center	-1200.1
		Lowest Energy	-1492.3
3	36	Center	-1253.2
		Lowest Energy	-1356.5
4	28	Center	-1301.4
		Lowest Energy	-1535.7
5	28	Center	-1213.1
		Lowest Energy	-1351.3
6	27	Center	-1573.8
		Lowest Energy	-1573.8
7	25	Center	-1483.9
		Lowest Energy	-1483.9
8	23	Center	-1200.3
		Lowest Energy	-1351.8
9	23	Center	-1310.4
		Lowest Energy	-1384.8
10	21	Center	-1272.8
		Lowest Energy	-1307.4
11	19	Center	-1312.6
		Lowest Energy	-1484.8
12	19	Center	-1360.1
		Lowest Energy	-1372.8
13	18	Center	-1294.2
		Lowest Energy	-1425.8
14	18	Center	-1345.1
		Lowest Energy	-1345.1
15	18	Center	-1466.3
		Lowest Energy	-1466.3
16	17	Center	-1392.0
		Lowest Energy	-1495.0
17	16	Center	-1300.8
		Lowest Energy	-1486.3
18	16	Center	-1224.5
		Lowest Energy	-1467.0
19	16	Center	-1236.2
		Lowest Energy	-1407.0
20	14	Center	-1323.3
		Lowest Energy	-1334.3
21	14	Center	-1236.4
		Lowest Energy	-1423.1
22	14	Center	-1243.2
		Lowest Energy	-1243.2
23	13	Center	-1281.6
		Lowest Energy	-1382.5
24	12	Center	-1446.8
		Lowest Energy	-1446.8
25	11	Center	-1221.5
		Lowest Energy	-1370.1
26	11	Center	-1195.3
		Lowest Energy	-1316.0
27	10	Center	-1309.8
		Lowest Energy	-1435.7
28	10	Center	-1337.6
		Lowest Energy	-1337.6
29	10	Center	-1254.8
		Lowest Energy	-1257.3

FT binds with AKT1:

Cluster	Members	Representative	Weighted Score
0	73	Center	-1042.5
		Lowest Energy	-1174.1
1	63	Center	-1336.7
		Lowest Energy	-1407.0
2	52	Center	-1145.4
		Lowest Energy	-1364.5
3	48	Center	-1221.1
		Lowest Energy	-1221.1
4	41	Center	-1106.1
		Lowest Energy	-1318.6
5	36	Center	-1021.0
		Lowest Energy	-1163.1
6	35	Center	-1095.9
		Lowest Energy	-1245.6
7	31	Center	-1247.7
		Lowest Energy	-1270.2
8	29	Center	-1125.5
		Lowest Energy	-1221.7
9	27	Center	-1230.6
		Lowest Energy	-1230.6
10	26	Center	-1139.7
		Lowest Energy	-1155.1
11	24	Center	-1103.1
		Lowest Energy	-1314.6
12	24	Center	-1031.2
		Lowest Energy	-1208.6
13	23	Center	-1065.9
		Lowest Energy	-1143.5
14	21	Center	-1156.7
		Lowest Energy	-1156.7
15	20	Center	-1218.3
		Lowest Energy	-1218.3
16	20	Center	-1027.9
		Lowest Energy	-1120.0
17	18	Center	-1064.5
		Lowest Energy	-1154.7
18	16	Center	-1018.3
		Lowest Energy	-1126.4
19	16	Center	-1179.5
		Lowest Energy	-1179.5
20	16	Center	-1169.6
		Lowest Energy	-1169.6
21	16	Center	-1109.0
		Lowest Energy	-1121.5
22	15	Center	-1185.3
		Lowest Energy	-1185.3
23	15	Center	-1117.4
		Lowest Energy	-1124.1
24	14	Center	-1065.6
		Lowest Energy	-1186.9
25	13	Center	-1018.6
		Lowest Energy	-1114.1
26	11	Center	-1052.4
		Lowest Energy	-1158.8
27	11	Center	-1129.1
		Lowest Energy	-1129.1
28	11	Center	-1085.7
		Lowest Energy	-1211.9
29	10	Center	-1096.3
		Lowest Energy	-1096.3

FT binds with PI3K:

Cluster	Members	Representative	Weighted Score
0	55	Center	-954.8
		Lowest Energy	-1085.4
1	50	Center	-1084.4
		Lowest Energy	-1160.2
2	47	Center	-944.4
		Lowest Energy	-1017.8
3	45	Center	-945.2
		Lowest Energy	-1033.9
4	44	Center	-1111.7
		Lowest Energy	-1111.7
5	37	Center	-951.3
		Lowest Energy	-1040.7
6	34	Center	-986.1
		Lowest Energy	-999.4
7	31	Center	-924.6
		Lowest Energy	-1247.1
8	31	Center	-1055.0
		Lowest Energy	-1055.0
9	30	Center	-1011.7
		Lowest Energy	-1011.7
10	30	Center	-925.2
		Lowest Energy	-1111.0
11	29	Center	-940.6
		Lowest Energy	-1086.3
12	28	Center	-903.0
		Lowest Energy	-968.0
13	26	Center	-1120.1
		Lowest Energy	-1120.1
14	24	Center	-955.8
		Lowest Energy	-1129.1
15	22	Center	-960.4
		Lowest Energy	-970.2
16	21	Center	-1194.9
		Lowest Energy	-1194.9
17	20	Center	-909.3
		Lowest Energy	-1013.5
18	20	Center	-993.1
		Lowest Energy	-993.1
19	19	Center	-959.9
		Lowest Energy	-975.7
20	15	Center	-1067.6
		Lowest Energy	-1067.6
21	14	Center	-994.5
		Lowest Energy	-994.5
22	11	Center	-987.0
		Lowest Energy	-1074.7
23	10	Center	-992.5
		Lowest Energy	-992.5
24	10	Center	-952.8
		Lowest Energy	-952.8
25	10	Center	-906.3
		Lowest Energy	-1042.7
26	9	Center	-922.3
		Lowest Energy	-989.8
27	9	Center	-902.5
		Lowest Energy	-1056.2
28	8	Center	-910.3
		Lowest Energy	-987.6
29	7	Center	-916.7
		Lowest Energy	-988.5

FT binds with GSK3:

Cluster	Members	Representative	Weighted Score
0	84	Center	-958.9
		Lowest Energy	-1282.7
1	59	Center	-1135.2
		Lowest Energy	-1174.2
2	54	Center	-963.3
		Lowest Energy	-1157.6
3	37	Center	-999.0
		Lowest Energy	-1091.5
4	32	Center	-944.9
		Lowest Energy	-1155.5
5	30	Center	-995.8
		Lowest Energy	-1181.4
6	29	Center	-1056.8
		Lowest Energy	-1134.7
7	24	Center	-1052.5
		Lowest Energy	-1166.8
8	23	Center	-972.7
		Lowest Energy	-1082.7
9	23	Center	-1035.6
		Lowest Energy	-1035.6
10	22	Center	-946.5
		Lowest Energy	-1119.6
11	22	Center	-1048.9
		Lowest Energy	-1048.9
12	22	Center	-967.4
		Lowest Energy	-1058.9
13	21	Center	-941.9
		Lowest Energy	-1007.1
14	19	Center	-954.8
		Lowest Energy	-1034.4
15	17	Center	-966.3
		Lowest Energy	-1091.2
16	17	Center	-1012.8
		Lowest Energy	-1054.7
17	17	Center	-1220.4
		Lowest Energy	-1220.4
18	17	Center	-1173.9
		Lowest Energy	-1173.9
19	16	Center	-957.0
		Lowest Energy	-1042.2
20	16	Center	-940.0
		Lowest Energy	-1018.8
21	15	Center	-963.6
		Lowest Energy	-1050.8
22	14	Center	-1005.2
		Lowest Energy	-1045.4
23	14	Center	-1002.4
		Lowest Energy	-1033.2
24	14	Center	-973.3
		Lowest Energy	-1029.9
25	13	Center	-954.5
		Lowest Energy	-1112.3
26	12	Center	-963.1
		Lowest Energy	-1118.2
27	12	Center	-941.0
		Lowest Energy	-1032.8
28	11	Center	-944.4
		Lowest Energy	-1126.4
29	11	Center	-1007.0
		Lowest Energy	-1007.0

The docking results revealed weighted scores all below -900, indicating a robust interaction between the fusion peptides and the receptor. Such low scores reflect a high binding affinity, suggesting that the ligand-protein complexes formed are both stable and biologically meaningful. In practical terms, a strong docking interaction means that the fusion peptides can effectively engage the receptor’s active or binding sites with minimal energy, which is essential for ensuring specific and sustained molecular recognition. This stability implies that the peptides have the potential to modulate receptor activity efficiently, enhancing the likelihood of achieving the desired therapeutic effect. For drug development, these findings are crucial because stable and strong binding interactions often correlate with improved efficacy, reduced off-target effects, and greater in vivo functionality. Overall, the favorable docking scores signify that the fusion peptides are promising candidates for further biological validation and optimization as effective drug molecules targeting the receptor.

Step 5: Pharmacokinetics Calculations

The step following functionality predictions is pharmacokinetics testing. After assessing the predicted biological activity and functional properties of our fusion peptides, it is critical to evaluate how these molecules behave in a physiological environment. Pharmacokinetics studies provide vital information on the absorption, distribution, metabolism, and excretion of the peptides, which influence their therapeutic effectiveness and safety. This step helps determine optimal dosing, duration of action, and potential systemic exposure or clearance mechanisms. For our innovative fusion peptides FT and BC, pharmacokinetics testing is essential to bridge the gap between predicted functionality and real-world biological performance, thereby guiding further development toward safe and efficacious eyedrop treatments.

To evaluate the absorption of our fusion peptides in the ocular environment, we have developed a quantitative model based on key pharmacokinetic equations. This model calculates the amount of protein available for absorption using predicted drug concentration changes over time, incorporating parameters such as clearance, volume of distribution, and drug diffusion coefficients in the vitreous, aqueous humor, and retina. By integrating biophysical constants like the Boltzmann constant, temperature, and drug molecule size, alongside experimentally derived permeability values, this approach offers a detailed understanding of how the peptides distribute and persist in the eye. Obtaining this absorption data is critical for optimizing dosage and delivery strategies to ensure therapeutic efficacy, minimize clearance, and achieve sustained drug presence at target sites within the retina.

Ocular Protein Absorption Prediction Model

The "Ocular Protein Absorption Prediction Model" mathematically describes the temporal kinetics of fusion peptide concentration in ocular compartments, emphasizing absorption dynamics crucial for therapeutic efficacy. The model starts with the core equation:

where C_pop.pred represents the predicted drug concentration over time, governed by:

Here, C0 is the initial drug concentration introduced into the ocular compartment,

represents the clearance rate of the peptide from the compartment, and V_pop quantifies the distribution volume, defined as the number of proteins per unit volume. This exponential decay function models the reduction in drug concentration due to clearance mechanisms over the peptide’s half-life, normalized by its volume of distribution.

Diffusion coefficients for the vitreous and aqueous humor (D_VitreousD_Retina ) are calculated through the Stokes-Einstein relation:

where k_b is the Boltzmann constant (1.381x10^-23JK^-1), T the absolute temperature (308.15K), ηη the dynamic viscosity of the ocular fluids ( = 0.00069 kg m^-1s^-1), and rHrH the hydrodynamic radius of the drug molecule. This equation quantifies how molecular size and environmental temperature-viscosity conditions limit drug diffusion in the ocular media.

For the retina, the diffusion coefficient DRetinaDRetina integrates permeability and barrier thickness:

where P_app, the effective permeability (ms^-1), is experimentally derived from CellPM simulations and approximated as insert permeability value, h is the retinal layer thickness (h = 0.0005m), and K is the retina-vitreous partition coefficient (K = 0.5).

Graphically, the predicted concentration-time profile C_pop.pred is plotted against time to visualize drug absorption kinetics. The curve displays an initial concentration C₀, followed by an exponential decline modulated by clearance and distribution volume parameters. Overlaying diffusion coefficients and retinal permeability helps interpret how fast and efficiently the peptides penetrate and persist within ocular tissues. This aids in identifying the time windows of optimal therapeutic concentration, signifying when peptide levels remain effective before clearance diminishes bioavailability.

The utility of this model lies in its comprehensive quantitative framework, combining classical pharmacokinetics with molecular diffusion and tissue-specific permeability. This enables precise prediction of fusion peptide ocular bioavailability, informing dosing regimens and formulation strategies to maximize retinal drug delivery while minimizing systemic exposure. Consequently, it serves as a critical tool for advancing the development of targeted eyedrop therapeutics, guiding experimental design, and reducing reliance on extensive in vivo testing in early development stages.

ELISA Analysis with Sigmoid Curves

The sigmoid, or S-shaped, curve is foundational to quantitative biological assays such as ELISA. This curve succinctly characterizes how the response to an input, such as analyte concentration, transitions from minimal to maximal output. On a typical plot, the x-axis represents the logarithm of the analyte concentration, allowing a broad range of values to be visualized with even spacing. The y-axis denotes the observed response, which, in the case of ELISA, is the absorbance value proportional to antigen-antibody complex formation. The logarithmic scaling of the x-axis is critical, as it transforms the steep, exponential rise seen in binding phenomena into a symmetrical, easily interpreted S-curve.

Interpreting the axes of the ELISA sigmoid curve is vital for understanding both assay sensitivity and dynamic range. The log concentration axis spreads out low concentration points, preventing clustering and thereby improving resolution where the assay is most sensitive. The absorbance axis details signal intensity, mirroring the degree of analyte detected. As concentration increases from left to right, the curve begins with a low, nearly flat region corresponding to minimal binding. In the central, steep region—the assay’s dynamic range—small changes in concentration lead to sharp increases in signal. This is where ELISA readings are most accurate for quantitation. At the far right, the curve plateaus as antibody binding sites saturate; here, additional increases in analytes yield little change in absorbance. The sigmoid model thus enables robust estimation and interpolation of concentrations within this informative mid-region, while recognizing the limitations near the plateaus.

The Mathematical Model: Four-Parameter Logistic Equation

The sigmoidal response observed in ELISA is typically captured using the four-parameter logistic (4PL) equation:

y: the response variable, which is the observed absorbance in the ELISA assay.
x: the independent variable, representing the concentration of the analyte (usually on a log scale, e.g., log(ng/ml)).
a: the minimum asymptote, corresponding to the lower plateau of the curve (minimal response or background absorbance when analyte concentration is very low).
d: the maximum asymptote, representing the upper plateau of the curve (maximum absorbance when antibody binding is saturated).
c: the inflection point or EC50/IC50, which is the concentration value at the midpoint of the curve where the response is halfway between
b: the Hill slope or steepness factor, describing how sharply the curve rises near the inflection point.

In this model, y is the observed absorbance, x is the log of analyte concentration, a and d represent the upper and lower asymptotes (the plateaus), c is the inflection point corresponding to 50% maximal response, and b adjusts the steepness near the midpoint. This model is ideally suited to precisely fit the experimental ELISA data and extract critical parameters for quantitative comparison.

The sigmoid curve provides a critical link between the raw ELISA signal and the actual analyte concentration. Since ELISA measures how antibodies bind to antigens as analyte concentration changes, and because the biochemical binding process is conserved, the resulting curve is inherently sigmoidal regardless of the specific peptide or protein being detected when using the same kit system. Particularly when absorbance values are normalized to the maximum for each assay, the general shape and interpretation of the curve remain valid across different peptides. This universal applicability streamlines analysis, as a single standard mathematical form can be fitted and used for multiple targets—only the scaling parameters require minor adjustment.

ELISA Curves

For a practical illustration, consider the standard ELISA curves generated for the peptides PI3K and TrkB. Each follows the expected sigmoid pattern: minimal absorbance at low concentrations, a sharp rise through the mid-concentration range, and a plateau at saturation. The inflection point in each curve pinpoints the concentration at which 50% of all binding sites are occupied (often referred to as the IC50), enabling direct comparison of assay sensitivity and analyte abundance between different samples. In the PI3K graph:

this inflection occurs at approximately 0.298 log(ng/ml) (or 1.986 ng/ml) with a normalized absorbance near 0.447.

For the ANP32A graph:

its normalized absorbance is at 0.44893682, while the inflection point is at approximately 2.77509 ng/ml.

For the TrkB graph:

we see a very similar inflection point at 0.300 log(ng/ml) (or 1.995 ng/ml) , though the corresponding absorbance value is much higher due to a greater maximum signal. These results highlight the suitability of the sigmoid model for accurately interpreting ELISA data and quantifying protein concentrations in a consistent fashion across experiments.

IC50 Calculations

The IC50 value calculated from ELISA sigmoid curves is a fundamental parameter in quantitative bioassays, representing the concentration of the protein analyte needed to achieve 50% of maximal antibody binding under standardized conditions. This is significant because it provides a direct means of quantifying the binding affinity of the target molecule: a lower IC50 indicates higher affinity, as less analyte is required to reach half-maximal binding, while a higher IC50 suggests weaker binding interactions.

IC50 determination relies on mathematical modeling that fits the experimental absorbance data to a four-parameter logistic equation, which describes the classic S-shaped curve observed in ELISA assays. The model uses four parameters—top and bottom plateaus (representing maximal and minimal responses), the Hill slope (describing steepness), and the inflection point (IC50)—to accurately interpolate this critical midpoint between the two asymptotes. By working with the full curve rather than estimating from selected data points, the four-parameter logistic fit enhances precision and reduces subjective error in identifying the IC50.

This calculated IC50 connects raw absorbance readings to meaningful biological concentration values, allowing researchers to assess how strongly their protein binds within the standardized assay setting. The extracted midpoint is visually evident as the point of steepest ascent on the sigmoid curve, where small changes in analyte concentration yield rapid changes in signal. Thus, the IC50 serves as a robust metric for comparing antibody affinity, characterizing different proteins, or evaluating experimental conditions—making it essential for ELISA data analysis and interpretation.

In conclusion, the IC50 for the bindings of TrkB for BC is 7.14824 ng/ml, PI3K for FT is 6.945194872 ng/ml, for BC, it is 6.80399ng/ml, where ANP32A for BC is 2.77509ng/ml.

Limitations

While the sigmoid curve and four-parameter logistic model provide a robust framework for standardizing ELISA results, it is important to recognize some inherent limitations in their application. The ELISA kit setting assumes a constant amount of antibody is present in each assay well, which creates a controlled and reproducible environment for constructing the standard curve and calculating parameters such as the IC50—the concentration at which 50% of antibodies are bound. However, this scenario does not fully reflect biological systems such as the eye, where the abundance and density of target analytes can vary widely. Hence, the IC50 values derived from ELISA standard curves primarily characterize assay conditions rather than the actual physiological concentrations in tissue samples. For meaningful translation to in vivo contexts, such as evaluating target receptor density in the eyeball, additional considerations including tissue-specific concentrations, distribution, and binding dynamics must be incorporated beyond the scope of the ELISA kit’s standardized antibody-analyte interactions. Therefore, while sigmoid modeling is invaluable for assay calibration and relative quantification, its parameters should be interpreted cautiously when extrapolating to complex biological environments.

Results

The use of the sigmoid mathematical model in ELISA analysis ensures reliability, comparability, and precision. By leveraging the robust fit provided by the four-parameter logistic equation, it is straightforward to interpolate unknown concentrations, establish assay limits, and directly compare results between different targets and conditions. The graphs for both PI3K and TrkB peptides closely follow the idealized S-curve, with well-defined inflection points and distinct plateau regions, indicating that your ELISA assays are performing optimally. The data demonstrate a high degree of reproducibility and sensitivity, confirming the effectiveness of the standardized ELISA approach in quantifying diverse peptides within the same experimental platform.

References

Bechtel, T. J., & Weerapana, E. (2017). From structure to redox: The diverse functional roles of disulfides and implications in disease. Proteomics, 17(6), 1600391. https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/pmic.201600391
Tartaglia GG, Pawar AP, Campioni S, et al. Prediction of aggregation-prone regions in structured proteins. J Mol Biol 2008;380:425–36
Zamora WJ, Campanera JM, Luque FJ. Development of a structure-based, pH-dependent Lipophilicity scale of amino acids from continuum solvation calculations. J Phys Chem Lett 2019;10:883–9.
Søndergaard CR, Olsson MHM, Rostkowski M, et al. Improved treatment of ligands and coupling effects in empirical calculation and rationalization of p Kavalues. J Chem Theory Comput 2011;7:2284–95.
Olsson MHM, Søndergaard CR, Rostkowski M, et al. PROPKA3: consistent treatment of internal and surface residues in empirical pka predictions. J Chem Theory Comput 2011;7:525–37.
Marc Oeller, Ryan Kang, Rosie Bell, Hannes Ausserwöger, Pietro Sormanni, Michele Vendruscolo, Sequence-based prediction of pH-dependent protein solubility using CamSol, Briefings in Bioinformatics, Volume 24, Issue 2, March 2023, bbad004
Vander Meersche, Y., Cretin, G., de Brevern, A. G., Gelly, J. C., & Galochkina, T. (2021). MEDUSA: Prediction of protein flexibility from sequence. Journal of Molecular Biology, 166882. https://doi.org/10.1016/j.jmb.2021.166882
Morozov, V.; Rodrigues, C.H.M.; Ascher, D.B. CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics 2023, 15, 431. https://doi.org/10.3390/pharmaceutics15020431
Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110.
Morozov, V., Rodrigues, C. H. M., & Ascher, D. B. (2023). CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics, 15(2), 431. https://doi.org/10.3390/pharmaceutics15020431
Osorio, D.; Rondón-Villarreal, P.; Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015, 7, 4–14.
Jung, F., Frey, K., Zimmer, D., & Mühlhaus, T. (2023). DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences, 24(8), 7444. https://doi.org/10.3390/ijms24087444
Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein Identification and Analysis Tools on the Expasy Server. (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press. pp. 571-607
Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-186.
Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and Varshavsky, A. J. (1989) Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700-16712.
Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 4,155-161.
Van Hilten, N.; Verwei, N.; Methorst, J; Nase, C.; Bernatavicius, A.; Risselada, H.J., Bioinformatics, 2024, 40(2).

MODEL

Model

0

Our objective

Step 1: Designing the Workflow

Step 2: Structural analysis

Developing the structure: Alphafold and PyMOL

Alphafold

Failed Results

Failed prototype 1

Failed prototype 2

Failed prototype 3

Successful Results

PyMOL

PyMOL Results

Other bonds in the structures of the peptides

ZDOCK

The priciple of ZDOCK

What is FFT-based search in docking?

ZDOCK Results

AlphaKnot 2.0 (Protein Knotting)

The principle of computational analysis

AlphaFold 2.0 Results

Step 3: Stability Predictions

CamSol (Intrinsic Solubility)

The Mathematical Concept

Camsol results

pH Solubility

Intrinsic Solubility

Summary of the above results

MEDUSA (Protein Flexibility)

Principle of AI Analysis

MEDUSA Results

CSM-Toxin (Toxicity Test)

Principle of AI Analysis

CSM-Toxin Results

DeepSTABp (Thermostability)

Principle of AI Analysis

DeepSTABp Results

ProtParam (Protein Half-life, Stability)

The Mathematical Concept

Extinction coefficients

In vivo / vitro half-life

Instability index

Aliphatic index

ProtParam Results

Conclusion of results of BC

Conclusion of results of FT

Step 4: Functionality Predictions

DeepKa

Principle of AI Analysis

DeepKa Results

PMIpred

Principle of AI Analysis

PMIpred results

Summary of FT

Summary of BC

AllerCatPro 2.0

Principle of AI Analysis

AllerCatPro 2.0 Results

HDock (Docking and Confidence Score)

The mathematical model

Docking results

ClusPro (Docking and Confidence Score)

The mathematical model

Docking results

Step 5: Pharmacokinetics Calculations

Ocular Protein Absorption Prediction Model

ELISA Analysis with Sigmoid Curves

The Mathematical Model: Four-Parameter Logistic Equation

ELISA Curves

IC50 Calculations

Limitations

Results

References