Model | MIT-MAHE - iGEM 2025

Docking

1. HDOCK

Docking helps us check the binding affinities between the ligand and the receptor. Since chitosan is our nanoparticle carrier, it is essential to determine which siRNA can bind to our chitosan nanoparticle spontaneously without requiring high energy.

The siRNA candidates designed to silence P. capsici have been docked with the chitosan monomer obtained from PUBCHEM to check for compatibility and interaction affinity using HDOCK.
In HDOCK, the monomer of chitosan is input for the ligand, and the siRNA designed is the input for the receptor.

Fig 1. Docking of siRNA candidate 1 with chitosan

Fig 2. Docking of siRNA Candidate 2 with chitosan

Table 1. Docking Scores using HDOCK for siRNA Candidate 1

Rank	Docking Score	Confidence Score	Ligand rmsd (Å)
1	-327.07	0.9718	17.78
2	-322.18	0.9690	20.11
3	-301.02	0.9535	17.31
4	-296.22	0.9490	20.57
5	-292.75	0.9456	19.95
6	-281.01	0.9322	20.21
7	-280.65	0.9317	21.87
8	-269.61	0.9162	21.16
9	-253.23	0.8874	20.08
10	-250.97	0.8828	19.88

Table 2. Docking Scores using HDOCK for siRNA Candidate 2

Docking Score	Confidence Score	Ligand rmsd (Å)
-269.15	0.9155	16.93
-266.94	0.912	18.05
-254.32	0.8896	21.71
-253.95	0.8888	19.83
-249.05	0.8788	21.02
-239.54	0.857	20.03
-231.79	0.837	20.77
-228.62	0.8281	19.91
-222.11	0.8088	25.25
-221.94	0.8083	24.58

Table 3. Docking Scores using HDOCK for siRNA Candidate 3

Docking Score	Confidence Score	Ligand rmsd (Å)
-245.17	0.8703	18.63
-230.71	0.834	22
-222.74	0.8107	20.6
-217.21	0.7932	20.34
-213.11	0.7794	20.8
-211.74	0.7747	63.09
-205.98	0.7539	55.58
-201.5	0.7369	18.41
-201.32	0.7362	19.77
-190.58	0.6925	58.99

Table 4: Docking Scores using HDOCK for siRNA Candidate 4

Docking Score	Confidence Score	Ligand rmsd (Å)
-184.19	0.6646	16.67
-183.8	0.6628	45.43
-183.78	0.6628	47.96
-182.15	0.6554	37.07
-181.72	0.6535	45.49
-180.18	0.6465	22.94
-177.94	0.6362	20.23
-177.69	0.635	46.94
-177.35	0.6334	30.05
-176.57	0.6298	43.13

From the docking scores provided above, it is clear that siRNA Candidate 1 and siRNA Candidate 2 have shown the highest negative score, implying that the binding affinity of these two siRNAs to chitosan is more spontaneous than the others. Hence, confirming the need to validate these siRNAs in wet lab to check for their silencing efficacy against P. capsici.

2. Maestro Glide

We also used Glide Ligand Docking of Maestro to increase the number of parameters for feature selection, picking the ones best suited for the model.

The procedure is as follows:

Start with the Protein Preparation Wizard.
Load the siRNA protein PDB file.
Select the cap termini option.
Load the prepared siRNA protein structure.
In the "Receptor Grid Generation" panel.
Deselect "Pick to identify ligand".
Go to the site tab.
Set the center to the centroid of selected residues.
Select all residues for centering.
Click Add and then OK.
The selected residues will define the Active Site.
The grid box appears and gets centered around these residues.
Submit and generate the receptor grid file.
Load the chitosan PDB file.
Use LigPrep module.
Select Epik to generate possible ionization/protonation states. Use the Ligand Docking panel.
Go to the Ligand Docking panel. Input Ligands from LigPrep output and Receptor Grid from the Grid Generation step.
Submit Docking.

Fig 3. Docking of siRNAs with chitosan using Maestro Glide

From the procedure used above, the following scores were derived from our simulation:

siRNA1: Docking = -8.234 (Rank 1 pose)
siRNA2: Docking = -7.933 (Rank 1 pose)
siRNA3: Docking = -7.517 (Rank 1 pose)
siRNA4: Docking = -6.189 (Rank 1 pose)

siRNA candidate 1 exhibited the lowest score, indicating the strongest interaction affinity with chitosan, thereby validating our choice of siRNA candidates for the wet lab experiments.

Our Stability Model (S.E.N.S.E)

We have developed a software solution that utilizes siRNA-nanoparticle interaction to generate docking scores that symbolize stability based on our designed model, by simply inputting the siRNA sequence and required nanoparticle, without needing to perform actual docking.

The results of the approach and analysis are provided below:

Fig 4. Output of the Model when inputted with random siRNA sequences not used in the training data

Fig 5. Docking score distribution of training data

Model Selection

The model comparison indicated that XGBoost outperformed the other models, achieving an RMSE of 33.30 and an R² value of 0.7459, explaining approximately 75% of the variance in the docking scores. Random Forest performed almost identically (RMSE 33.38, R² 0.7448), while Lasso regression gave the second-best performance with RMSE 33.78 and R² 0.7386. Since both tree-based models performed better than the linear approach, we inferred that the relationship between siRNA sequences and their binding stability is not linear. There are likely complex interactions between nucleotide positions and nanoparticle properties that the non-linear models can capture better.

Feature Engineering

Through feature engineering, we generated 217 features that integrated sequence characteristics such as AU and GC content, sequence entropy, and dinucleotide frequencies, together with nanoparticle type and positional nucleotide encoding. This hybrid strategy proved effective as it captured both the chemical attributes of the sequences and the positional details that influence binding.

One-hot encoded sequences (215 features):
An siRNA sequence can be up to 43 nucleotides long, including the guide and passenger strands. For each of the 43 positions, we used five binary variables (0 or 1) representing whether that position contains A, U, G, C, or is empty (for shorter sequences). Only one of the 5 is marked as 1; the rest are 0. So 43 positions × 5 options = 215 features. This tells the model exactly which nucleotide appears at each specific location in the sequence, which helps in analyzing the position of the base or nucleotide in the given siRNA sequence.
Sequence composition features (18 features):
GC content: 1 feature measuring the percentage of G and C nucleotides. AU content: 1 feature measuring the percentage of A and U nucleotides. Dinucleotide counts: 16 features counting how often each pair of adjacent nucleotides appears (AA, AU, AG, AC, UA, UU, UG, UC, GA, GU, GG, GC, CA, CU, CG, CC) Sequence entropy: A variable or feature that defines the variation, frequency, and positions of the codons or bases in an input siRNA sequence.
Nanoparticle type (2 features):
The model assigned two binary variables (0 and 1) to chitosan-siRNA complexes and lipid-siRNA complexes to distinguish the data between the two nanoparticles when binding with the siRNA sequences. Total: 215 + 1 + 1 + 16 + 1 + 2 = 236 features (the number 217 suggests some features were removed during preprocessing, possibly redundant padding positions). These features provide the model with precise position-specific details and broader sequence characteristics that influence binding stability.

The most striking observation was that the nanoparticle type had the most significant influence; specifically, using chitosan increased the docking score by about 68.4 points compared to lipid.

This makes sense given that chitosan and lipid nanoparticles have fundamentally different surface chemistries and charge properties. AU content came in second with a coefficient of 37.1, followed by GC content at 31.0, and CG dinucleotide frequency at 1.5.

These findings align well with scientific literature. AU-rich regions in RNA are more flexible and can adopt different conformations, which affect how they interact with the nanoparticle surface. GC content influences the overall stability and rigidity of the RNA structure since GC base pairs are stronger than AU pairs. The CG dinucleotide frequency is interesting because CG steps in nucleic acids have unique structural properties that could influence binding interfaces. The substantial impact of the nanoparticle type highlights that different delivery vehicles cannot be used interchangeably; the carrier’s chemistry is just as critical as the sequence itself.