Loading...
Back to top

Modeling

Overview

In the AGEs Robber project, sRAGE or the N-glycan depleted sRAGE are fused with collagen binding domain (CBD), enabling its immobilization on the collagen layer of AGEs-Thwart Patch (see Design for details).

In the beginning, sRAGE and CBD were connected via a linker. The brainstorming further brings the idea of switchable molecule by intein splicing system. Therefore, the linker is replaced by Npu DnaE split intein.

Whether the initial fusion protein system or the subsequent switchable intein splicing system, the propre protein folding is essential for protein function. Accordingly, we performed structural modeling of sRAGE-CBD or switchable sRAGE-CBD fusion protein by AlphaFold3.

However, AlphaFold3 assigned relatively low confidence scores to the inter-domain connecting regions, due to lack of homologous structure in training databases. To further evaluate the reliability of predicted structure, a Ramachandran plot was generated by PyMOL to examine the distribution of backbone dihedral angles, confirming the absence of significantly unreasonable conformations.

Preparation for Protein Modeling

Sequences

The first step of modeling is collecting the protein sequences. In our project, the protein sequences are listed below:

  • Protein sequences
    sRAGE AQNITARIGE PLVLKCKGAP KKPPQRLEWK LNTGRTEAWK VLSPQGGGPW
    DSVARVLPNG SLFLPAVGIQ DEGIFRCQAM NRNGKETKSN YRVRVYQIPG
    KPEIVDSASE LTAGVPNKVG TCVSEGSYPA GTLSWHLDGK PLVPNEKGVS
    VKEQTRRHPE TGLFTLQSEL MVTPARGGDP RPTFSCSFSP GLPRHRALRT
    APIQPRVWEP VPLEEVQLVV EPEGGAVAPG GTVTLTCEVP AQPSPQIHWM
    KDGVPLPLPP SPVLILPEIG PQDQGTYSCV ATHSSHGPQE SRAVSISIIE P
    Mutated sRAGE AQQITARIGE PLVLKCKGAP KKPPQRLEWK LNTGRTEAWK VLSPQGGGPW
    DSVARVLPQG SLFLPAVGIQ DEGIFRCQAM NRNGKETKSN YRVRVYQIPG
    KPEIVDSASE LTAGVPNKVG TCVSEGSYPA GTLSWHLDGK PLVPNEKGVS
    VKEQTRRHPE TGLFTLQSEL MVTPARGGDP RPTFSCSFSP GLPRHRALRT
    APIQPRVWEP VPLEEVQLVV EPEGGAVAPG GTVTLTCEVP AQPSPQIHWM
    KDGVPLPLPP SPVLILPEIG PQDQGTYSCV ATHSSHGPQE SRAVSISIIE P
  • Linker sequences
    Linkers are short peptide sequences used to connect different domains within fusion proteins. In the beginning, two types of linkers with rigid characteristics were selected for structural-modeling. Their rigidity helps minimize unwanted interactions between adjacent protein domains. (Arai et al., 2001) (Bhandari et al., 1986)
    A(EAAAK)n A(n= 2) AEAAAKEAAA KA
    (AP)7 APAPAPAPAP APAP
  • CBD sequences
    Collagen-binding domain is a protein domain that specifically binds to collagen. In our project, CBD is incorporated to anchor the fusion protein onto collagen-containing hydrogels. We selected the lumican LRR 5–7 and the fibromodulin LRR5-7 region because previous studies reported their low dissociation constant (Kd), indicating high binding affinity toward collagen. (Kalamajski & Oldberg, 2009)
    Lumican LRR 5-7 NLTFIHLQHN RLKEDAVSAA FKGLKSLEYL DLSFNQIARL PSGLPVSLLT
    LYLDNNKISN IPDEYFKR
    Fibromodulin LRR 5-7 NLTALYLQHN EIQEVGSSMR GLSLILLDLS YNHLRKVPDG LPSALEQLYM
    EHNNVYTVPD SYFRG
  • Npu DnaE split intein sequences
    Inteins are self-cleaving protein elements that catalyze protein splicing, excising themselves while ligating the flanking N- and C-exteins through a native peptide bond. We selected the Npu DnaE intein system because it exhibits exceptionally fast splicing kinetics (short half-life). (Iwai, Züger, Jin, & Tam, 2006)
    RGK-(Nup-DnaEN) CLSYETEILT VEYGLLPIGK IVEKRIECTV YSVDNNGNIY TQPVAQWHDR
    GEQEVFEYCL EDGSLIRATK DHKFMTVDGQ MLPIDEIFER ELDLMRVDNL PN
    (Nup-DnaEC)-CWE MIKIATRKYL GKQNVYDIGV ERDHNFALKN GFIASN
  • Enhanced Green Fluorescent Protein
    Enhanced Green Fluorescent Protein (EGFP) is used as a fluorescent reporter. In our project we used EGFP to verify successful transfections in mammalian cells and to visually confirm surface attachment of our fusion proteins onto collagen hydrogels.
    EGFP MVSKGEELFT GVVPILVELD GDVNGHKFSV SGEGEGDATY GKLTLKFICT
    TGKLPVPWPT LVTTLTYGVQ CFSRYPDHMK QHDFFKSAMP EGYVQERTIF
    FKDDGNYKTR AEVKFEGDTL VNRIELKGID FKEDGNILGH KLEYNYNSHN
    VYIMADKQKN GIKVNFKIRH NIEDGSVQLA DHYQQNTPIG DGPVLLPDNH
    YLSTQSALSK DPNEKRDHMV LLEFVTAAGI TLGMDELYK

Interpretation of AlphaFold Confidence Metrics

pLDDT value

Predicted Local Distance Difference Test (pLDDT) is a confidence score used by AlphaFold to indicate how reliable the predicted structure is at each amino acid residue. A higher score means AlphaFold has greater confidence in the accuracy of that region. In AlphaFold‘s colored structural models, residues are represented according to their pLDDT values:

  • 90–100 (blue): Very high confidence; atomic coordinates can be trusted.
  • 70–90 (cyan/green): High confidence; overall reliable with minor possible errors.
  • 50–70 (yellow): Low confidence; structural prediction may be inaccurate and should be interpreted with caution.
  • <50 (orange–red): Very low confidence; often indicates intrinsically disordered regions where AlphaFold cannot provide an accurate prediction.
When analyzing predicted structures, the blue and green regions usually represent reliable folded cores, while yellow and red regions often correspond to flexible or disordered parts that may require external factors (e.g., ligands or binding partners) to stabilize.

Predicted Aligned Error plot

The Predicted Aligned Error (PAE) is an AlphaFold output that estimates the pairwise confidence of residue positions within a predicted protein structure. Unlike pLDDT, which evaluates individual residues, the PAE matrix describes the relative accuracy between all residue pairs, making it particularly useful for assessing domain packing, flexible regions, and inter-domain orientations.

In a PAE plot:

  • X-axis (Scored Residue): Amino acid sequence index (residue 1 to the last).
  • Y-axis (Aligned Residue): Amino acid sequence index.
  • Color scale:
    • Dark green (< 5 Å): High confidence in relative positioning.
    • Light green to white (> 20 Å): Low confidence, indicating flexibility or uncertainty.

Build

Build the structure mutated sRAGE

We first modeled the mutated sRAGE using AlphaFold to evaluate its structural stability after mutation. Based on the pLDDT color scheme, we observed that the folding of each domain appeared to be well maintained.

  • Mutated sRAGE

Build the structure of collagen binding domains

We used LRR5–7 domains from lumican and fibromodulin for modeling. From the pLDDT color scheme, we observed that the folding of each domain appeared to be well maintained.

  • Lumican (Domain :LRR5-7)
  • Fibromodulin (Domain : LRR5-7)

Build the fusion protein with linker

To identify a suitable CBD for our design and to determine whether a linker was required to connect CBD and sRAGE, we performed fusion protein structure prediction using AlphaFold. We tested the following combinations:(Blue word is linker; red word is CBD)

  1. sRAGE_No linker (EAK)_Lumican
  2. sRAGE_No linker (EAK)_Fibromodulin
  3. sRAGE_(AP)7_Lumican
  4. sRAGE_(AP)7_Fibromodulin
  5. sRAGE_A(EAAK)nA_Fm
  6. sRAGE_A(EAAK)nA_Lum
To ensure that the folding of the two proteins would not interfere with each other, three amino acids (EAK) were inserted into the no-linker fusion protein.
圖1
▲ sRAGE_No linker_ Lumican
圖2
▲ sRAGE_No linker_Fibromodulin
圖1
▲ sRAGE_(AP)7_ Lumican
圖2
▲ sRAGE_(AP)7_Fibromodulin
圖1
▲ sRAGE_A(EAAK)2A_ Lumican
圖2
▲ sRAGE_A(EAAK)2A_ Fibromodulin
In the Predicted Aligned Error plot, three dark green blocks can be observed, corresponding to the V and C1 domains of sRAGE, the C2 domain of sRAGE, and the collagen-binding domain. The greater the amount of surrounding white space outside these dark green blocks, the higher the uncertainty in the relative positioning among these domains. Under such conditions, there is a possibility that two domains may be positioned too closely, leading to unwanted interactions. Therefore, it is necessary to identify the combination in which the dark green blocks are surrounded by the least amount of white space.

Among the six tested combinations, lumican showed better performance than fibromodulin, and the no-linker construct best met our criteria. While these comparisons allowed us to identify relatively more favorable constructs, all six models still exhibited a substantial number of white regions. To further assess the reliability of the predicted structures, Ramachandran plots of all six models were generated using PyMOL.

Model Validation through Ramachandran Plot Analysis

The Ramachandran plot is an analytical tool used to assess the stereochemical quality of protein structures. It plots the backbone dihedral angles φ (phi) and ψ (psi) on the x- and y-axes, respectively, with each point representing the angle combination of a single amino acid residue. Due to steric hindrance and chemical bond constraints, only certain regions are considered “allowed regions,” typically corresponding to α-helices, β-sheets, and left-handed helices.

In the plot, red and yellow contours indicate the most favored and allowed dihedral angle regions, respectively, while black dots represent the actual φ/ψ angles of residues in the protein. If most residues fall within the allowed regions, the model is considered structurally reasonable and stable. Conversely, a significant number of points in disallowed regions may indicate conformational issues or prediction errors. Therefore, the Ramachandran plot is widely used to validate protein models and serves as an important reliability metric, particularly in molecular modeling and structure prediction studies.

To further evaluate the structural reliability of the predicted models, Ramachandran plots were generated for all six structures.

圖1
▲ sRAGE_No linker_ Lumican
圖2
▲ sRAGE_No linker_Fibromodulin
圖1
▲ sRAGE_(AP)7_ Lumican
圖2
▲ sRAGE_(AP)7_Fibromodulin
圖1
▲ sRAGE_A(EAAK)2A_ Lumican
圖2
▲ sRAGE_A(EAAK)2A_ Fibromodulin
Across the six Ramachandran plots, the majority of backbone dihedral angles fall within the allowed regions, indicating that the predicted structures are generally reasonable. However, each plot shows approximately five residues with dihedral angles in disallowed regions. Despite these few outliers, the overall structural quality of the models remains acceptable, supporting their suitability for subsequent analyses. Based on these observations, the fusion protein without linker and using lumican appears to be the most favorable.

Build the fusion protein with Npu DnaE split intein

Although previous modeling indicated that the fusion protein without a linker using CBD from lumican is the most favorable, we adapted the patch for broader clinical applications by introducing the Npu DnaE split intein system, enabling modular assembly of sRAGE or other therapeutic proteins such as antimicrobial peptides with CBD. Therefore, we modeled the following fusion protein constructs:

  1. (Nup-DnaEC)-CWE-6xHis-CBD(Lumican)
    A His-tag was introduced into the sequence to facilitate protein purification.
  2. 6xHis-(Nup-DnaEC)-CWE-CBD(Lumican)
    A His-tag was introduced into the sequence to facilitate protein purification.
  3. sRAGE-eGFP-6xHis-RGK-(Nup-DnaEN)
    A His-tag was introduced into the sequence to facilitate protein purification, while EGFP was incorporated to verify successful transfection in mammalian cells and to visually confirm the surface attachment of our fusion proteins onto collagen hydrogels.
  4. sRAGE-eGFP-6xHis-RGK-CWE-6xHis-CBD
    Structural modeling of intein-spliced fusion proteins.

Conclusion

Through structural modeling with AlphaFold3, we build the structure of sRAGE and its fusion proteins, particularly the lumican-based, no-linker design. These predicted structures suggested proper folding and minimal inter-domain interference. By integrating the Npu DnaE split intein system into our protein structure modeling, we believed that a modular and switchable platform of therapeutic proteins on is possible.

Since there are many types of AGEs with distinct molecular structures, future work could involve modeling the interactions between various AGEs and our fusion proteins, further optimizing binding specificity and therapeutic performance.

Reference

  1. Arai, R., Ueda, H., Kitayama, A., Kamiya, N., & Nagamune, T. (2001). Design of the linkers which effectively separate domains of a bifunctional fusion protein. Protein engineering, 14(8), 529-532.
  2. Bhandari, D. G., Levine, B. A., Trayer, I. P., & Yeadon, M. E. (1986). 1H‐NMR study of mobility and conformational constraints within the proline‐rich N‐terminal of the LC1 alkali light chain of skeletal myosin: Correlation with similar segments in other protein systems. European journal of biochemistry, 160(2), 349-356.
  3. Kalamajski, S., & Oldberg, Å. (2009). Homologous sequence in lumican and fibromodulin leucine-rich repeat 5-7 competes for collagen binding. Journal of Biological Chemistry, 284(1), 534-539.
  4. Iwai, H., Züger, S., Jin, J., & Tam, P. H. (2006). Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme. FEBS letters, 580(7), 1853-1858.