Engineering Success | MIT-MAHE

LEARN

TEST

DESIGN

BUILD

The Design-Build-Test-Learn (DBTL) cycle was foundational to the development of our project. While developing each component, we incorporated learnings from every experimental outcome towards building more fruitful procedures.

This process allowed us to continually improve the design and functionality of our system, while also reframing our approach to scientific problem solving.

DBTL was much more than an engineering framework in our work, as it guided our way of thinking and experimenting as young researchers in a foundational manner.

Project DBTL

Iteration 1

Project Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

We surveyed literature on genes involved in the pathogenicity and survival of Phytophthora spp. and shortlisted targets with roles in virulence and signaling. Since several genes were initially annotated only in related Phytophthora species, their orthologous sequences were retrieved, and a BLAST search was conducted against the P. capsici genome to confirm the presence of homologs. However, preliminary testing revealed high conservation and extensive off-target risks, making many of these targets unsuitable. To refine our approach, we used the transcriptomic data of P. capsici to validate gene expression during infection, which led us to bZIP, a transcription factor critical for pathogenicity.

Build

The complete mRNA transcript was retrieved from curated databases, and siRNA candidates were generated using siDirect and siRNAPred. These were then filtered using established design rules such as URA criteria, GC content distribution, and nucleotide preferences at key positions.

The shortlisted siRNA candidates underwent rigorous structural validation to ensure functional stability and specificity. Target mRNA secondary structures were predicted using RNAfold to confirm binding accessibility, while DuplexFold and MaxExpect evaluated duplex stability and removed unstable constructs. To further strengthen our process, off-target risks were assessed through BLAST searches against the NCBI RefSeq RNA, RefSeq Select RNA, and CORE nucleotide databases. Sequences showing significant similarity to humans, black pepper (P. nigrum), or other local crops were excluded. After two full Design–Build–Test–Learn iterations, four siRNA candidates were finalized as stable, target-specific, and URA-compliant.

Building on these computationally validated candidates, experimentation was designed through iHP, and literature reviews were conducted for potential experiments to gauge the bZIP’s pathogenicity, efficacy, and delivery strategies. The chitosan nanoparticles were synthesized and optimized in the lab for our application. The siRNA was then encapsulated in the nanoparticle to function as the implementable nanoformulation.

Test

Characterization of every nanoparticle and nanoformulation was done using the Particle Size Analyzer. Encapsulation was validated visually using gel retardation assays, and entrapment efficiencies were calculated. A cytotoxicity assay was performed using naked siRNA (ranging from 1.5 nM to 100 μM) on Piper nigrum leaves.

Fluorescence microscopy helped visualize the uptake and internalization of the siRNA, thereby helping us test the release mechanism of the nanoparticle. Detached Leaf Assays were performed both prophylactically and therapeutically, with various controls, to observe the direct effect of the nanoformulation, naked siRNA, and chitosan nanoparticles on the leaves.

Learn

The development of the nanoformulation was optimized through several runs. However, complete encapsulation of the siRNA was not observed, attributed to pH, fluctuation in flow rate, and stirring speed. The siRNA showed 0% cytotoxicity towards the plant at all concentrations. The detached leaf assays directly showed a reduction in lesion formation on application of the solution. This helped us measure the efficacy of our solution. Complimenting these were the motility of the zoospores, which were observed to be less upon the application of our solution. The siRNA design process was lengthy and iterative, prompting us to develop a pipeline integrating sequence retrieval, candidate generation, URA (unique read alignments) based filtering, and structural validation.

Iteration 2

Project Iteration 2

LEARN

TEST

DESIGN

BUILD

Design

We developed an siRNA design automation pipeline that mimics the Design-Build-Test-Learn (DBTL) cycle 1 by integrating multiple siRNA Design software through a Selenium-based automation bot. This system accepts a target gene as the initial input, processes it through each software in a sequence, and passes the output to the next stage, enabling end-to-end automated design and evaluation of siRNA sequences.

Build

Since siRNA is often delivered via various nanoparticle carriers such as chitosan and lipids, we built a predictive model to select the most suitable siRNA candidates based on their interaction profiles with these nanoparticles. Using docking simulation data, we trained a machine learning model to predict the interaction of siRNA molecules with their nanoparticle carriers. This addition helps tailor siRNA design not only for gene silencing efficiency but also for optimal compatibility and stability with delivery vehicles used in therapeutics and other applications.

Test

These optimized siRNA–nanoparticle pairs will undergo validation through transcriptomic analysis and RT-qPCR to measure gene silencing efficacy and specificity. In parallel, we are exploring cell-based siRNA synthesis to enable upscaling, while discussions have also begun for controlled field trials to test the formulated solution under real-world conditions.

Learn

The results of transcriptomic profiling and RT-qPCR will provide feedback on silencing efficiency and off-target effects, feeding back into the pipeline to refine future designs. The RT-qPCR tests will also help learn the specificity and efficacy by quantifying bZIP1 reduction, aiding in the measurement of the RNAi system. Cell-based siRNA synthesis will allow us to bring our solution to more farmers, upgrading the process. The controlled field trials will be used to draw comparisons based on the environment and help learn the other biosafety considerations. This allows us to display the robustness of the solution in the target environment and gauge real-world efficacy.

Dry Lab

siRNA Design

Iteration 1

siRNA Design Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

Through a literature review, we identified multiple papers discussing genes responsible for pathogenicity and survival of Phytophthora spp., and we shortlisted a set of genes with significant roles; silencing these targets could interfere with critical signaling pathways essential for infection.

Build

We began by examining genes across the Phytophthora genus to identify those implicated in virulence and signaling. If a candidate gene was not directly annotated in P. capsici, the corresponding orthologous sequence from other Phytophthora species was retrieved and subjected to BLAST analysis against the P. capsici genome. This approach confirmed the presence of homologous genes in P. capsici and ensured that the sequence data used for further analysis were accurate, complete, and directly relevant to the pathogen.

Test

To evaluate whether the genes were suitable targets, the full-length gene sequence was subjected to BLAST searches against multiple reference databases. This was done to check the level of sequence conservation and detect potential similarities with homologous genes in non-target organisms. This step was crucial because excessive conservation across species (crops or beneficial plants) increases the risk of unintended off-target silencing.

Learn

The results we obtained showed too many off-target effects. Our advisor suggested additional screening to check for transcriptomic validation in Phytophthora capsici of the gene targets that were selected from other Phytophthora spp.

Table 1. Summary of potential gene targets

Gene	Pros	Cons	Results and learning
bZIP1 (Basic Leucine Zipper TF) (Blanco & Judelson, 2005)	Silencing Pibzp1 in P. infestans abolished infection. Pc ortholog sequence available.	Family conserved, but the specific virulence role was not clear.	Advanced Strong virulence evidence High-value target.
PcCHS (Chitin Synthase) (Cheng et al., 2019)	PcCHS knockout reduced growth, spore release, and virulence. Pc sequence available.	Conserved enzyme family, but risk of fungal off-targets.	Advanced Robust Pc evidence Requires careful design.
GPA1 (G-protein α) (Latijnhouwers et al., 2003)	Critical for motility/chemotaxis in Ps/Pi; single copy, essential.	Highly conserved across eukaryotes Broad off-targets Pc sequence context is uncertain.	Eliminated Conservation makes it unsuitable.
PcRXLR effectors (family) (Cheng et al., 2022)	Key virulence effectors: RNAi against RXLR1/4 reduces infection; Pc transcripts are available.	A very large multigene family, which leads to a redundancy problem	Eliminated Redundancy lowers the impact of targeting single members.
PcNLPs (e.g., PcNLP6) (Park et al., 2023)	Encode necrosis-inducing proteins; dsRNA reduces infection and transcript levels.	A multigene family with overlapping roles, which leads to functional redundancy This may mask the silencing of one gene.	Eliminated Redundancy limits effectiveness unless multiple family members are targeted simultaneously.
PiCAT2 (Catalase) (Wang et al., 2020)	In Pi, it affects growth, reproduction, stress tolerance, and virulence.	Very high off-target hits in diverse species Pc link unclear.	Eliminated Extensive off-targets, Weak Pc evidence.
PiCDC14 (Cdc14 Phosphatase) (Fong & Judelson, 2003)	In Pi, required for sporulation and development.	Stage-specific function Weak infection link in Pc.	Eliminated Not directly tied to pathogenicity.
PsMAPK7 (MAPK7) (Gao et al., 2014)	In Ps, required for stress tolerance, germination, and infection.	MAPKs are highly conserved, but no Pc data.	Eliminated Conservation and species gap.
PsMPK1 (SLT2-type MAPK) (Li et al., 2014)	In Ps, silencing abolished pathogenicity.	Strong conservation with plants/fungi, but off-target risk.	Eliminated Unsuitable for specific silencing.
PcCutinases (e.g., PcCut, PiCut3) (Muñoz & Bailey, 1998)	Early penetration factors: dsRNA against cutinases impaired growth and pathogenicity.	A multigene family with overlapping roles causes a redundancy problem.	Eliminated Silencing a single gene was unlikely to yield a meaningful effect.

Gene

Pros

Cons

Results and learning

bZIP1 (Basic Leucine Zipper TF)
(Blanco & Judelson, 2005)

Silencing Pibzp1 in P. infestans abolished infection.

Pc ortholog sequence available.

Family conserved, but the specific virulence role was not clear.

Advanced

Strong virulence evidence

High-value target.

PcCHS (Chitin Synthase)

(Cheng et al., 2019)

PcCHS knockout reduced growth, spore release, and virulence.

Pc sequence available.

Conserved enzyme family, but risk of fungal off-targets.

Advanced

Robust Pc evidence

Requires careful design.

GPA1 (G-protein α)

(Latijnhouwers et al., 2003)

Critical for motility/chemotaxis in Ps/Pi; single copy, essential.

Highly conserved across eukaryotes

Broad off-targets

Pc sequence context is uncertain.

Eliminated

Conservation makes it unsuitable.

PcRXLR effectors (family)

(Cheng et al., 2022)

Key virulence effectors: RNAi against RXLR1/4 reduces infection; Pc transcripts are available.

A very large multigene family, which leads to a redundancy problem

Eliminated

Redundancy lowers the impact of targeting single members.

PcNLPs (e.g., PcNLP6)

(Park et al., 2023)

Encode necrosis-inducing proteins; dsRNA reduces infection and transcript levels.

A multigene family with overlapping roles, which leads to functional redundancy

This may mask the silencing of one gene.

Eliminated

Redundancy limits effectiveness unless multiple family members are targeted simultaneously.

PiCAT2 (Catalase)

(Wang et al., 2020)

In Pi, it affects growth, reproduction, stress tolerance, and virulence.

Very high off-target hits in diverse species

Pc link unclear.

Eliminated

Extensive off-targets,

Weak Pc evidence.

PiCDC14 (Cdc14 Phosphatase)

(Fong & Judelson, 2003)

In Pi, required for sporulation and development.

Stage-specific function

Weak infection link in Pc.

Eliminated

Not directly tied to pathogenicity.

PsMAPK7 (MAPK7)

(Gao et al., 2014)

In Ps, required for stress tolerance, germination, and infection.

MAPKs are highly conserved, but no Pc data.

Eliminated

Conservation and species gap.

PsMPK1 (SLT2-type MAPK)

(Li et al., 2014)

In Ps, silencing abolished pathogenicity.

Strong conservation with plants/fungi, but off-target risk.

Eliminated

Unsuitable for specific silencing.

PcCutinases (e.g., PcCut, PiCut3)

(Muñoz & Bailey, 1998)

Early penetration factors:

dsRNA against cutinases impaired growth and pathogenicity.

A multigene family with overlapping roles causes a redundancy problem.

Eliminated

Silencing a single gene was unlikely to yield a meaningful effect.

Blanco, F. A., & Judelson, H. S. (2005). A bZIP transcription factor from Phytophthora interacts with a protein kinase and is required for zoospore motility and plant infection. Molecular Microbiology, 56(3), 638–648. https://doi.org/10.1111/j.1365-2958.2005.04575.x

Cheng, W., Lin, M., Qiu, M., Kong, L., Xu, Y., Li, Y., Wang, Y., Ye, W., Dong, S., He, S., & Wang, Y. (2019). Chitin synthase is involved in vegetative growth, asexual reproduction, and pathogenesis of Phytophthora capsici and Phytophthora sojae. Environmental Microbiology, 21(12), 4537–4547. https://doi.org/10.1111/1462-2920.14744

Cheng, W., Lin, M., Chu, M., Xiang, G., Guo, J., Jiang, Y., Guan, D., & He, S. (2022). RNAi-Based Gene Silencing of RXLR Effectors Protects Plants Against the Oomycete Pathogen Phytophthora capsici. Molecular Plant-Microbe Interactions, 35(6), 440–449. https://doi.org/10.1094/mpmi-12-21-0295-r

Fong, A. M. V. A., & Judelson, H. S. (2003). Cell cycle regulator Cdc14 is expressed during sporulation but not hyphal growth in the fungus‐like oomycete Phytophthora infestans. Molecular Microbiology, 50(2), 487–494. https://doi.org/10.1046/j.1365-2958.2003.03735.x

Gao, J., Cao, M., Ye, W., Li, H., Kong, L., Zheng, X., & Wang, Y. (2014). PsMPK7, a stress‐associated mitogen‐activated protein kinase (MAPK) in Phytophthora sojae, is required for stress tolerance, reactive oxygenated species detoxification, cyst germination, sexual reproduction and infection of soybean. Molecular Plant Pathology, 16(1), 61–70. https://doi.org/10.1111/mpp.12163

Latijnhouwers, M., Ligterink, W., Vleeshouwers, V. G., Van West, P., & Govers, F. (2003). A Gα subunit controls zoospore motility and virulence in the potato late blight pathogen Phytophthora infestans. Molecular Microbiology, 51(4), 925–936. https://doi.org/10.1046/j.1365-2958.2003.03893.x

Li, A., Zhang, M., Wang, Y., Li, D., Liu, X., Tao, K., Ye, W., & Wang, Y. (2014). PsMPK1, an SLT2-type mitogen-activated protein kinase, is required for hyphal growth, zoosporogenesis, cell wall integrity, and pathogenicity in Phytophthora sojae. Fungal Genetics and Biology, 65, 14–24. https://doi.org/10.1016/j.fgb.2014.01.003

Muñoz, C. I., & Bailey, A. M. (1998). A cutinase-encoding gene from Phytophthora capsici isolated by differential-display RT-PCR. Current Genetics, 33(3), 225–230. https://doi.org/10.1007/s002940050330

Park, M., Kweon, Y., Lee, D., & Shin, C. (2023). Suppression of Phytophthora capsici using double-stranded RNAs targeting NLP effector genes in Nicotiana benthamiana. Applied Biological Chemistry, 66(1). https://doi.org/10.1186/s13765-023-00768-4

Wang, T., Wang, X., Zhu, X., He, Q., & Guo, L. (2020). A proper PiCAT2 level is critical for sporulation, sporangium function, and pathogenicity of Phytophthora infestans. Molecular Plant Pathology, 21(4), 460–474. https://doi.org/10.1111/mpp.12907

Iteration 2

siRNA Design Iteration 2

LEARN

TEST

DESIGN

BUILD

Design

The learnings accumulated thus far allowed us to identify bZIP, a transcription factor implicated in the virulence of P. capsici, for siRNA targeting. The complete mRNA sequence was retrieved from curated databases.

Build

Multiple siRNA prediction tools, including siDirect and siRNAPred, were used to generate an initial pool of candidate sequences. Design rules such as the URA criteria (low seed-target duplex stability, A/U enrichment in positions 15–19, and absence of internal repeats), GC content distribution, and nucleotide preferences at specific positions were applied to refine the pool.

Test

The shortlisted siRNA candidates were subjected to structural validation. Secondary structure predictions of the target mRNA were performed using RNAfold, ensuring that the candidate binding sites were located in accessible regions. Duplex stability of the siRNA molecules was assessed using DuplexFold and MaxExpect, which enabled the elimination of unstable constructs.

Potential off-target effects were evaluated through BLAST searches against the NCBI RefSeq RNA, RefSeq Select RNA, and CORE nucleotide databases. Any siRNA showing significant similarity to sequences in humans, black pepper (P. nigrum), or crops grown in close proximity was excluded.

Learn

The results were analyzed to identify recurrent features of the most promising siRNAs, such as AU-rich seed regions and stable duplex formation. Insights from this cycle were integrated into the next design iteration, resulting in refined and more reliable siRNA candidates.

This iterative process ensured that the siRNA design was robust, reproducible, and systematically validated.

Software

siUltimate

Iteration 1

siUltimate Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

The objective of building siUltimate was to develop an end-to-end pipeline for siRNA design by integrating the existing siRNA design software, significantly reducing the time required for it.

Build

Softwares Used

The initial designs of siUltimate involved automating the use of three websites: siDirect, siRNApred, and DuplexFold. siDirect and siRNApred are used to design siRNAs, while DuplexFold predicts the secondary structure of a given siRNA. The TP53 tumor suppressor gene was chosen as an initial test sequence, as it returned good results on all three of these softwares when used with default settings.

Browser Automation

Selenium, a Python framework for browser automation, was used to automate the interaction with these websites. Selenium allows simulating interactions with websites in a human-like manner, and elements can be selected using CSS selectors. It was used to input mRNA sequences, change settings, and get results from the websites.

Data Formatting

The resulting data from each website was converted into a pandas dataframe, as it made it much easier to work with in code. Formatting the siDirect output required splitting the guide strand and passenger strand columns.

Fig 2. First 8 formatted siDirect results for iteration 1

siRNApred had a longer formatting process, requiring formatting of the position and sequence columns. The extra hyphens (-) and the text 3’ and 5’ had to be removed from all of the values in the sequence column, and the data in the position column also required removal of extra hyphens.

Fig 3. Formatted siRNApred results for iteration 1

Pooling of Results

The common siRNA sequences from the siDirect and siRNApred output data were retrieved. This new set of siRNAs was ranked according to the following parameters in a decreasing order of priority:

siRNAs following the Ui-Tei rules (these rules act as a filter for siRNAs, and the siRNAs that do not follow them are ranked very low).
siRNAs following the Reynolds and Amarzguioui rules (these rules act as a way of ranking siRNAs, and siRNAs that follow these, in addition to the Ui-Tei rules, are the best).
The siRNApred score (this was the lowest priority of ranking, and it served as a tiebreaker for siRNAs ranked equally by the previous two parameters).

The top 15 siRNA sequences were chosen to continue on to the next round of the pipeline.

Fig 4. Pooled siRNA results for iteration 1

Filtering via DuplexFold

DuplexFold was used to predict the secondary structure of the siRNAs. It outputted a CT (connect table) file, which was then analyzed by the software. It checked whether the secondary structure of the siRNA had a misregister, which would indicate an unstable secondary structure. siRNAs failing this check were eliminated from the data, and the final list of siRNAs was the output of the pipeline.

Fig 5. Final list of siRNAs for TP53 gene outputted by siUltimate

Test

The pipeline was tested with the TP53 gene sequence, and it worked as expected. Following this, it was tested with the bZIP gene, which returned no results.

Learn

The bZIP gene returned no results as siDirect did not provide any siRNAs for this gene. This was because the default settings of siDirect have a very high siRNA screening threshold, which was not met by the siRNAs for the bZIP gene. Fallback settings needed to be added for these cases, progressively lowering the threshold if the number of siRNAs outputted by siDirect was too small.

Iteration 2

Adding Progressive Fallback Settings

LEARN

TEST

DESIGN

BUILD

Design

Two levels of fallback settings for siDirect were selected:

Unchecking the “Hide less-specific siRNAs” checkbox: This filter was not important for our use case, as we were not designing siRNAs for humans, and this option checks for siRNA specificity in humans.
Increasing the seed duplex stability max temperature to 30°C: This setting is important for reducing the number of off-target effects of the siRNA. siRNAs with a higher seed duplex stability temperature will have more off-target effects.

Build

The browser automation code was modified to use these fallback settings for siDirect. If the output did not have enough siRNAs, the next fallback setting would be used.

Test

Fig 6. First 8 formatted siDirect results for iteration 2

Fig 7. Formatted siRNApred results for iteration 2

Fig 9. Final list of siRNAs outputted for the bZIP gene by siUltimate

The software returned good results for the bZIP gene. These matched up with the siRNAs designed by the team earlier. The software, while functional, was written entirely in a Jupyter Notebook. This was a convenient interface for development, but did not provide a good user interface for a non-tech-savvy user.

Learn

It was important for siUltimate to have a graphical user interface so that using it would be easier for researchers.

Iteration 3

Building the User Interface

LEARN

TEST

DESIGN

BUILD

Design

The backend of the software was created using FastAPI, a Python library that allows for quick and easy creation of simple backends. The frontend used Jinja templates, as they allowed for easy display of dynamic content.

Build

Backend

The backend of the software was built with 6 endpoints:

/ (root) - Served the homepage, where the user could input the target mRNA sequence.
/submit - Received the form input, created a new job, started the pipeline, and returned the job ID.
/status - Served the status page. Returned the current status of a job when given its ID.
/output - Served the output page. Returned the output of a completed job when given its ID.
/output/json - Returned the output siRNA list in JSON format.
/output/csv - Returned the output siRNA list in CSV format for data export/download.

Frontend

The frontend of the software was built with 3 webpages:

Home: This was where the user could enter an input sequence to submit a job. On submission, it would display the job ID. In addition, buttons to check the status of the submitted job and to submit another job would replace the submit button.
Status: It would show the current status of a job.
Output: It would show the output list of siRNAs, sorted in order of score.

Test

Fig 10. The home page of the user interface

Fig 11. The home page after submitting a job

Fig 12. The status page showing the current status of the job

Fig 13. The status page on completion of the job

The software worked as expected, returning the same results for the bZIP gene as in the previous iteration, with an improved user experience.

Learn

The software worked well with a good user experience. It, however, lacked any API endpoints for other software to directly integrate it into their workflows, and it was designed to be easy to build and run locally. It could not be deployed as an online website in its current state.

S.E.N.S.E.

Iteration 1

S.E.N.S.E. Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

Our initial designs of the stability model were based on data collected through docking and molecular simulations that involved “Docking with Attracting Cavities” in Swiss Dock.

The initial design involved implementing and comparing three machine learning algorithms- Lasso regression, Random Forest, and XGBoost to predict SP-dG values from molecular docking.

Build

Hyperparameter tuning was employed to enhance the performance of Random Forest and XGBoost using cross-validation.

Test

The testing phase revealed that despite tuning efforts, the Lasso model was significantly better than both tree-based methods, achieving an exceptional cross-validated RMSE of 0.0025 (+-0.0029) and R-squared of 0.9998 (+-0.0006). The Random Forest and XGBoost models exhibited poor generalization with negative R-squared values, indicating that they performed worse than a simple mean predictor. The Lasso equation identified 'Nonpolar' (coefficient: 0.7453) as the dominant feature, followed by 'Inter' (0.1773) and 'Polar15' (0.0910).

Learn

From the SHAP analysis of the initial Lasso model, we got an outlier at index 62 with unusually high 'Nonpolar' values, which was identified as potentially skewing the model's performance.

Fig 15. SHAP values to indicate the high parameter value for iteration 1

Iteration 2

S.E.N.S.E. Iteration 2

LEARN

TEST

DESIGN

BUILD

Design

The same data as before was also fed into this second iteration.

The redesign phase focused on improving data quality by removing this problematic data point.

Build

Built the refined Lasso model on the cleaned dataset.

Test

Testing showed remarkable improvement with near-perfect performance, achieving an RMSE of 0.0013 (+-0.0004) and R-squared of 1.0000 (+-0.0000). The updated model equation showed more balanced feature contributions: SP-dG_scaled = (0.4640 × Nonpolar) + (0.4075 × Inter) + (0.1521 × Polar15), with 'Nonpolar' remaining the most influential feature but with reduced dominance compared to the first iteration.

Learn

Suggests better model stability and generalizability after the outlier removal process.The Swissdock data was found to be less than ideal.

Fig 16. SHAP values to indicate the high parameter value for iteration 2

Iteration 3

S.E.N.S.E. Iteration 3

LEARN

TEST

DESIGN

BUILD

Design

The data collected came from docking results generated using Glide in Maestro and HDOCK. For this iteration, our design aimed to predict docking values for an unknown sequence. To do this, we merged docking data with siRNA sequences and used Lasso regression to predict binding stability.

Build

We hit roadblocks: identifier columns had mismatched data types, preventing merges, the code had indentation errors, and the target variable kept disappearing during data transformations.

Test

Testing showed that the 'Ligand RMSD' column was being dropped during merge operations and when scaling features.

Learn

We learned to explicitly track column preservation, format identifiers consistently between datasets, and restructure the preprocessing to handle the target variable separately from features. After corrections over hours of debugging, the final iteration successfully integrated everything- proper data merging, sequence encoding, feature combination, while keeping the target intact, and a functional Lasso model with prediction capabilities for screening new sequences. The model trained on H-Dock data was found to have better accuracy than the model trained on Glide data. Hence, we decided to use H-Dock for data collection and validation of the model.

Iteration 4

S.E.N.S.E. Iteration 4

LEARN

TEST

DESIGN

BUILD

Design

The goal was to predict docking scores for siRNA-nanoparticle complexes using only sequence information and nanoparticle type. Two datasets (Lipid: 230 sequences, Chitosan: 220 sequences) were combined. The approach involved encoding sequences into 217 numerical features: 215 from positional one-hot encoding of 43-nucleotide padded sequences, 18 from composition metrics (GC/AU content, dinucleotide frequencies, entropy), and 2 for nanoparticle type.

Build

The datasets were merged with standardized column names, and nanoparticle type labels were added. Feature engineering created the full 217-feature set, and data was split 80/20 for training and testing with standard scaling applied. All three models were trained: Lasso with cross-validated regularization, Random Forest with 200 trees, and XGBoost with gradient boosting.

Test

XGBoost achieved the best performance, followed by Random Forest with Lasso slightly behind. Lasso's feature selection reduced 217 features to 4 critical ones: nanoparticle type (coefficient +68.42), AU content (+37.09), GC content (+30.94), and CG dinucleotide frequency (+1.52). However, with new sequences, all models returned nearly identical values for different sequences. This indicated that they weren't capturing enough sequence-specific variance to distinguish between inputs.

Learn

Non-linear models significantly outperformed. The feature selection validated biological intuition - nanoparticle chemistry dominates binding behavior, followed by composition metrics that affect RNA structure and flexibility.

The minimal contribution from positional encoding suggests overall composition matters more than nucleotide order. The prediction uniformity problem revealed that having many features doesn't guarantee good discrimination if they don't capture meaningful differences between sequences.

Wet Lab

Chitosan Nanoparticle (CSNP) Production

Iteration 1

Standardizing the Production Protocol

LEARN

TEST

DESIGN

BUILD

Design

To establish the ionic gelation protocol in our lab, such that it produces the ideal nanoparticle carriers. We first optimized the experimental setup, concentrations, ratios, and volumes to be used by running many iterations and varying individual parameters.

Build

Over each iteration of nanoparticle production (through ionic gelation), we were able to test the following:

Chitosan solubility was tested in vials containing acetic acid and water.
The flow rate of TPP (on a burette) to be added to the stirring chitosan was estimated.
Two chitosan solutions of 0.4% and 0.1% concentrations were crosslinked with a 2.4 mg/mL TPP solution in a 3:1 ratio.
Nanoparticles with two chitosan: TPP ratios were made to verify whether the 3:1 ratio obtained from the literature is ideal.
Chitosan solutions were also prepared in 0.8% and 1.22% acetic acid solutions to gauge the effect on the nanoparticles.

Test

Every nanoparticle sample was subjected to a Particle Size Analyzer (Horiba Scientific Nano Partica SZ-100).

The parameters we focused on were:

Hydrodynamic radius: Effectively, the size of the nanoparticles. Ideally, it would be in the range of 100-200 nm.
Zeta potential: A sufficient value is essential to allow the nanoparticle to penetrate the wall of P. capsici. Its value must be over 30 mV.
Polydispersity index: Refers to the distribution of weights in a sample. This value should not exceed 0.5.

Learn

These initial runs allowed us to fine-tune the protocol and finalize fundamental production parameters. The key learnings were:

Chitosan solution must be prepared in 1% acetic acid, as complete solubility is observed on stirring.
The TPP crosslinker must be added to chitosan at an interval of 7 seconds between consecutive drops, at a 1:3 ratio.
The concentration of chitosan solution must be 0.1% (1 mg/mL), as higher concentrations cause aggregation (evident from higher nanoparticle sizes).
Most samples showed fine suspensions or sizes as high as 1808 nm. This indicated that sonication might be necessary for homogenization.

Iteration 2

Optimizing Nanoparticle Size

LEARN

TEST

DESIGN

BUILD

Design

We realized that sonication of the sample is necessary to bring down the size of the particles and prevent clumping. The goal here was to focus on the nanoparticle size and obtain ideal values for the same.

Build

Every sample was then sonicated after the addition of TPP at 40-60% amplitude and a 10-second ON/OFF pulse for 8 minutes. All other parameters previously optimized were kept constant. A second sonication of 5 minutes for the initial 0.1% chitosan solution was introduced to reduce the size of the particles.

Test

On running the samples using a Particle Size Analyzer, we observed a significant reduction in size. Most samples were now showing sizes in the range of 100-200 nm (such as 148 nm, 113 nm, 146 nm).

Learn

Although the hydrodynamic radius has been optimized, the zeta potential remains lower than required. On consulting our PI, we established that the pH of the solution should be measured at every step to understand the stability. She recommended that the pH of the initial chitosan solution also be adjusted to 4.0 using 1N NaOH. An additional centrifugation and decanting step was also added post the nanoparticle sonication, to obtain the ideal zeta potential.

Iteration 3

Addressing the Zeta Potential

LEARN

TEST

DESIGN

BUILD

Design

The goal was now to increase the zeta potential of the particles while retaining their size and low polydispersity. Measuring the pH of all solutions and adjusting when necessary was essential. The centrifugation of the nanoparticle sample was also introduced to optimize the zeta values.

Build

Adding on to the protocol optimized thus far, the pH of chitosan, TPP, and nanoparticle solutions was measured. As the ideal pH was 4.0, adjustments were made using 1N NaOH to bring it to this value. The sonicated nanoparticle solutions were also centrifuged for 5 minutes at 1000 rpm at 27°C. The supernatant was collected thereafter and served as the final nanoparticle sample to be analyzed.

Test

On carrying out iterations with the new modifications, we observed that the zeta potential of the samples was significantly higher than earlier. Some of the samples showed values close to 100 mV, with the average zeta being well over the required range.

Learn

Through these runs, we were able to produce nanoparticles with sizes in the range of 100-200 nm, zeta potential of over 30 mV, having a polydispersity of under 0.5. Although there were certain outlier samples, the majority produced using this optimized process showed a favourable combination for the intended purpose.

Iteration 4

Preparation for siRNA Encapsulation

LEARN

TEST

DESIGN

BUILD

Design

To prepare for the integration of siRNA into this process, we conducted nanoparticle runs while using diethyl pyrocarbonate (DEPC) and RNase-free water throughout the process. The objective was to observe any potential effect these modifications would have on the nanoparticles.

Build

All glassware and apparatus used in the process were first treated with DEPC water. The acetic acid, chitosan, and TPP solutions were also prepared using RNase-free water. This ensured that the entire setup was RNase-free and, therefore, safe to encapsulate the siRNA inside the nanoparticle. The protocol followed was the same, otherwise.

Test

These nanoparticle samples maintained a satisfactory combination of size (82 nm & 130 nm), zeta potential (89 mV and 169 mV), and polydispersity (between 0.4-0.5).

Learn

These final runs allowed us to prepare for the encapsulation of siRNA while producing the nanoparticles. The result of all the optimizations made thus far allowed us to safely and effectively integrate siRNA into the process.

Nanoformulation Production

Iteration 1

Nanoformulation Production Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

Having successfully produced nanoparticles with ideal characteristics for siRNA delivery and reproduced these results while maintaining RNase-free conditions, we sought to incorporate the siRNA to build our working nanoformulation.

Build

All glassware and apparatus were soaked in DEPC-treated water for 3 hours and then autoclaved to mitigate any RNase contamination. The pH of the chitosan solution was adjusted to 4.0, as done previously. 15 μL of 100 μM concentration siRNA was combined with the tripolyphosphate crosslinker, which was then added to the stirring chitosan solution.

Test

The following results were obtained from the sample:

Hydrodynamic radius (Z-avg): 6410.3 nm
Zeta potential: -4.0 mV
Polydispersity index: 2.318

Learn

Given that these results are a stark contrast to the other iterations run so far, we learnt that extended treatment with DEPC combined with the pH adjustment of the chitosan solution could have shot up the final pH, which would affect the stability, zeta potential, and size of the nanoparticles. Therefore, we were recommended to simply rinse the apparatus with RNase-free water and follow the same protocol without the pH adjustment.

Iteration 2

Nanoformulation Production 2

LEARN

TEST

DESIGN

BUILD

Design

The second run implemented the changes recommended to us by our PI, Dr. Ritu Raval, to observe if these modifications were able to encapsulate the siRNA and help produce a nanoformulation with suitable characteristics.

Build

The apparatus was simply rinsed with RNase-free water and dried, as recommended. All solutions were also made with RNase-free water, as previously done. The pH of the chitosan solution was also not adjusted. Lastly, the nanoformulation sample was sonicated for a longer period of 10 minutes.

Test

The following were the Particle Size Analyzer results obtained for the sample:

Hydrodynamic radius (Z-avg): 401.7 nm
Zeta potential: 35.8 mV
Polydispersity index: 0.509

The pH of the final nanoformulation was observed to be 3.24.

Learn

The changes made in this run significantly improved every parameter of the nanoparticles. Considering that the siRNA had been encapsulated, an increase in particle size was a natural and expected outcome. Moreover, the zeta potential and polydispersity were extremely favourable for an encapsulated sample. Therefore, we successfully produced our nanoformulation with ideal characteristics. This sample could now be used in other experiments to verify its effectiveness.

Detached Leaf Assay

Iteration 1

Detached Leaf Assay Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

We aimed to understand the efficacy of our chitosan nanoparticles by testing them directly against Phytophthora capsici's infection on Piper nigrum leaves of the Panniyur-1 variety. We set up negative and positive controls (untreated leaves and leaves sprayed with water and acetic acid) to compare the lesions of each. iHP sessions with Dr. Biju Narayanan helped design this experiment.

Build

Piper nigrum leaves of the same maturity were first sprayed with water, 1% acetic acid, and two different nanoparticle solutions (one sample had a better size and zeta potential). Following this, 5 mm mycelial plugs were placed on the leaves such that the hyphae came in contact with the leaf tissue. Lesion growth on the leaf was measured over 3 days post-infection.

Test

Lesions on the positive controls grew substantially over the 3 days and became very large by the end. The size of the lesion was larger in the case of water than acetic acid. However, the leaves sprayed with the chitosan nanoparticles displayed the least lesion growth. Moreover, the lesion on the nanoparticle sample with the better size and zeta potential was comparatively less.

Learn

As expected, the chitosan nanoparticles significantly reduced lesion growth on the P. nigrum leaves. This can be attributed to their antifungal properties, which would produce an even greater overall effect when combined with the siRNA. Furthermore, nanoparticles with zeta potential and size values closer to the ideal have a greater effect against P. capsici infection, thereby validating our process.

Iteration 2

Detached Leaf Assay Iteration 2

LEARN

TEST

DESIGN

BUILD

Design

The same experiment was repeated using various concentrations of naked siRNA, in a prophylactic and therapeutic manner. The effect of the nanoformulation was also tested on the leaves.

Build

Leaves were sprayed with 10 µL of 1 µM and 100 nM concentrations of siRNA and infected three days later with a mycelial plug (prophylactic treatment). A set of leaves was also treated simultaneously with infection to simulate therapeutic application. Two nanoformulations were also tested to assess their effectiveness.

Test

The leaves sprayed with siRNA in a therapeutic manner developed a lesion over 72 hours, although much smaller than the positive control. However, the leaves sprayed with 1 µM siRNA prophylactically did not develop lesions at all throughout the period. Lesion development on the optimized nanoformulation (zeta potential of 35.8 mV and 401.7 nm size) was absent as well.

Learn

We observed that prophylactic treatment of the leaves curbed infection to a much greater degree. This validates the preventative implementation we have designed. Although application of any concentration of siRNA/nanoformulation reduced infection compared to the positive controls, the effect was more pronounced in the case of prophylactic 1 µM siRNA and the optimized nanoformulation. Therefore, an effective concentration of siRNA that curbs infection has been estimated. Moreover, the nanoformulation showed near complete silencing of infection, which indicates that the complex is stable, and the siRNA is being released successfully to carry out gene silencing.

A side-by-side comparison of these treatments can be found on our measurement page.

P. capsici Zoospore Testing

Iteration 1

P. capsici Zoospore Testing Iteration 1

LEARN

TEST

DESIGN

BUILD

Design

As the siRNA is designed to silence the bZIP gene (which regulates zoospore motility) in P. capsici zoospores, isolating and treating pathogenic zoospores with the siRNA and observing its effect is another way of validating the gene silencing activity of the siRNA. Since the siRNA is encapsulated in chitosan nanoparticles, we first treated zoospores with the nanoparticles to observe the effect.

Build

Mycelial plugs from 72-hour-old P. capsici subcultures were placed in autoclaved distilled water and incubated under illumination at 25°C for 48 hours. The solution was then cold shocked at 4°C for 30 minutes to facilitate the release of motile zoospores from the sporangium. The zoospore solution was then aliquoted into 500 μL in each tube, and 250 μL of chitosan nanoparticle solution and 1% acetic acid were added to each tube. The samples were lightly stained with methylene blue dye and visualized at 40x and 100x using a Trinocular Upright Phase Contrast microscope.

Test

On two occasions, motile zoospores were successfully observed. Using a Neubauer chamber the number of zoopores per mL of water was 7.85 x 10⁷. The zoospores were visualized and treated with acetic acid (our chitosan nanoparticle solution was suspended in the same), and our chitosan nanoparticle solution during the next iteration. The acetic acid did not show any change in zoospore motility, whereas those treated with the chitosan nanoparticle showed a change in overall motility.

Learn

Having understood that the chitosan nanoparticles affect the zoospores, to precisely gauge the effect of the treated zoospores, in our next iteration, we used a Neubauer chamber to calculate the time it took for a zoospore to move from one part of the grid to another.

Iteration 2

P. capsici Zoospore Testing Iteration 2

LEARN

TEST

DESIGN

BUILD

Design

Zoospores were treated with varying concentrations of the siRNA as well as the nanoformulation to observe their effect on zoospore motility.

Build

Zoospores were isolated from 72-hour-old cultures as done earlier. 6 µL of zoospore samples were treated with 4 µL of 7.23 µM and 100 µM siRNA, as well as the nanoformulation. The samples were incubated for 30 minutes and visualized at 100x over multiple fields of view using a Trinocular upright phase contrast microscope.

Test

Untreated zoospore samples retained their motility. The same was observed with acetic acid and nanoparticle treatments.

However, zoospores treated with the siRNA and nanoformulation showed no observable motility over 20 seconds.

Learn

Visible motility in the untreated, acetic acid, and nanoparticle-treated zoospores indicated that infection had not entirely been curbed. The complete reduction in motility on siRNA and nanoformulation treatment conclusively verifies gene silencing in the pathogenic zoospores. The result also validates the encapsulation and release mechanisms of the siRNA-nanoparticle complex.