Model

Basket Bird BirdWithPepper

LOADING...


To assess the compatibility and interaction strength between candidate siRNAs designed to silence Phytophthora capsici and chitosan, we performed two bioinformatics approaches: docking simulations (HDOCK and Maestro Glide) and machine learning-based modeling. Docking allowed us to evaluate the spontaneity and binding affinity of siRNA-nanoparticle complexes, while predictive modeling enabled us to generalize interaction patterns without repeatedly performing computationally intensive docking. Together, these methods provided a robust framework to screen and validate siRNA candidates before testing them in the lab.

Docking

1. HDOCK

Docking helps us check the binding affinities between the ligand and the receptor. Since chitosan is our nanoparticle carrier, it is essential to determine which siRNA can bind to our chitosan nanoparticle spontaneously without requiring high energy.

  • The siRNA candidates designed to silence P. capsici have been docked with the chitosan monomer obtained from PUBCHEM to check for compatibility and interaction affinity using HDOCK.
  • In HDOCK, the monomer of chitosan is input for the ligand, and the siRNA designed is the input for the receptor.
Fig 1. Docking of siRNA candidate 1 with chitosan
Fig 1. Docking of siRNA candidate 1 with chitosan
Fig 2. Docking of siRNA Candidate 2 with chitosan
Fig 2. Docking of siRNA Candidate 2 with chitosan
Table 1. Docking Scores using HDOCK for siRNA Candidate 1
Rank Docking Score Confidence Score Ligand rmsd (Å)
1 -327.07 0.9718 17.78
2 -322.18 0.9690 20.11
3 -301.02 0.9535 17.31
4 -296.22 0.9490 20.57
5 -292.75 0.9456 19.95
6 -281.01 0.9322 20.21
7 -280.65 0.9317 21.87
8 -269.61 0.9162 21.16
9 -253.23 0.8874 20.08
10 -250.97 0.8828 19.88
Table 2. Docking Scores using HDOCK for siRNA Candidate 2
Docking Score Confidence Score Ligand rmsd (Å)
-269.15 0.9155 16.93
-266.94 0.912 18.05
-254.32 0.8896 21.71
-253.95 0.8888 19.83
-249.05 0.8788 21.02
-239.54 0.857 20.03
-231.79 0.837 20.77
-228.62 0.8281 19.91
-222.11 0.8088 25.25
-221.94 0.8083 24.58
Table 3. Docking Scores using HDOCK for siRNA Candidate 3
Docking Score Confidence Score Ligand rmsd (Å)
-245.17 0.8703 18.63
-230.71 0.834 22
-222.74 0.8107 20.6
-217.21 0.7932 20.34
-213.11 0.7794 20.8
-211.74 0.7747 63.09
-205.98 0.7539 55.58
-201.5 0.7369 18.41
-201.32 0.7362 19.77
-190.58 0.6925 58.99
Table 4: Docking Scores using HDOCK for siRNA Candidate 4
Docking Score Confidence Score Ligand rmsd (Å)
-184.19 0.6646 16.67
-183.8 0.6628 45.43
-183.78 0.6628 47.96
-182.15 0.6554 37.07
-181.72 0.6535 45.49
-180.18 0.6465 22.94
-177.94 0.6362 20.23
-177.69 0.635 46.94
-177.35 0.6334 30.05
-176.57 0.6298 43.13

From the docking scores provided above, it is clear that siRNA Candidate 1 and siRNA Candidate 2 have shown the highest negative score, implying that the binding affinity of these two siRNAs to chitosan is more spontaneous than the others. Hence, confirming the need to validate these siRNAs in wet lab to check for their silencing efficacy against P. capsici.

2. Maestro Glide

We also used Glide Ligand Docking of Maestro to increase the number of parameters for feature selection, picking the ones best suited for the model.

The procedure is as follows:

  1. Start with the Protein Preparation Wizard.
  2. Load the siRNA protein PDB file.
  3. Select the cap termini option.
  4. Load the prepared siRNA protein structure.
  5. In the "Receptor Grid Generation" panel.
  6. Deselect "Pick to identify ligand".
  7. Go to the site tab.
  8. Set the center to the centroid of selected residues.
  9. Select all residues for centering.
  10. Click Add and then OK.
  11. The selected residues will define the Active Site.
  12. The grid box appears and gets centered around these residues.
  13. Submit and generate the receptor grid file.
  14. Load the chitosan PDB file.
  15. Use LigPrep module.
  16. Select Epik to generate possible ionization/protonation states. Use the Ligand Docking panel.
  17. Go to the Ligand Docking panel. Input Ligands from LigPrep output and Receptor Grid from the Grid Generation step.
  18. Submit Docking.
 Fig 3. Docking of siRNAs with chitosan using Maestro Glide
Fig 3. Docking of siRNAs with chitosan using Maestro Glide

From the procedure used above, the following scores were derived from our simulation:

siRNA1: Docking = -8.234 (Rank 1 pose)
siRNA2: Docking = -7.933 (Rank 1 pose)
siRNA3: Docking = -7.517 (Rank 1 pose)
siRNA4: Docking = -6.189 (Rank 1 pose)

siRNA candidate 1 exhibited the lowest score, indicating the strongest interaction affinity with chitosan, thereby validating our choice of siRNA candidates for the wet lab experiments.

Our Stability Model (S.E.N.S.E)

We have developed a software solution that utilizes siRNA-nanoparticle interaction to generate docking scores that symbolize stability based on our designed model, by simply inputting the siRNA sequence and required nanoparticle, without needing to perform actual docking.

The results of the approach and analysis are provided below:

Fig 4. Output of the Model when inputted with random siRNA sequences not used in the training data
Fig 4. Output of the Model when inputted with random siRNA sequences not used in the training data
Fig 5. Docking score distribution of training data
Fig 5. Docking score distribution of training data

Model Selection

The model comparison indicated that XGBoost outperformed the other models, achieving an RMSE of 33.30 and an R² value of 0.7459, explaining approximately 75% of the variance in the docking scores. Random Forest performed almost identically (RMSE 33.38, R² 0.7448), while Lasso regression gave the second-best performance with RMSE 33.78 and R² 0.7386. Since both tree-based models performed better than the linear approach, we inferred that the relationship between siRNA sequences and their binding stability is not linear. There are likely complex interactions between nucleotide positions and nanoparticle properties that the non-linear models can capture better.

Feature Engineering

Through feature engineering, we generated 217 features that integrated sequence characteristics such as AU and GC content, sequence entropy, and dinucleotide frequencies, together with nanoparticle type and positional nucleotide encoding. This hybrid strategy proved effective as it captured both the chemical attributes of the sequences and the positional details that influence binding.

  1. One-hot encoded sequences (215 features):
    An siRNA sequence can be up to 43 nucleotides long, including the guide and passenger strands. For each of the 43 positions, we used five binary variables (0 or 1) representing whether that position contains A, U, G, C, or is empty (for shorter sequences). Only one of the 5 is marked as 1; the rest are 0. So 43 positions × 5 options = 215 features. This tells the model exactly which nucleotide appears at each specific location in the sequence, which helps in analyzing the position of the base or nucleotide in the given siRNA sequence.
  2. Sequence composition features (18 features):
    GC content: 1 feature measuring the percentage of G and C nucleotides. AU content: 1 feature measuring the percentage of A and U nucleotides. Dinucleotide counts: 16 features counting how often each pair of adjacent nucleotides appears (AA, AU, AG, AC, UA, UU, UG, UC, GA, GU, GG, GC, CA, CU, CG, CC) Sequence entropy: A variable or feature that defines the variation, frequency, and positions of the codons or bases in an input siRNA sequence.
  3. Nanoparticle type (2 features):
    The model assigned two binary variables (0 and 1) to chitosan-siRNA complexes and lipid-siRNA complexes to distinguish the data between the two nanoparticles when binding with the siRNA sequences. Total: 215 + 1 + 1 + 16 + 1 + 2 = 236 features (the number 217 suggests some features were removed during preprocessing, possibly redundant padding positions). These features provide the model with precise position-specific details and broader sequence characteristics that influence binding stability.

The most striking observation was that the nanoparticle type had the most significant influence; specifically, using chitosan increased the docking score by about 68.4 points compared to lipid.

This makes sense given that chitosan and lipid nanoparticles have fundamentally different surface chemistries and charge properties. AU content came in second with a coefficient of 37.1, followed by GC content at 31.0, and CG dinucleotide frequency at 1.5.

These findings align well with scientific literature. AU-rich regions in RNA are more flexible and can adopt different conformations, which affect how they interact with the nanoparticle surface. GC content influences the overall stability and rigidity of the RNA structure since GC base pairs are stronger than AU pairs. The CG dinucleotide frequency is interesting because CG steps in nucleic acids have unique structural properties that could influence binding interfaces. The substantial impact of the nanoparticle type highlights that different delivery vehicles cannot be used interchangeably; the carrier’s chemistry is just as critical as the sequence itself.