Model
Introduction—The Basic Framework for Our Work
As computational techniques and mathematical modelling methods become increasingly indispensable in the biological sciences, the PekingHSC 2025 team intends to explore the full potential of dry-lab experiments in this competition. We intend to use various computational analysis and mathematical modelling approaches to investigate the different ways in which dry-lab experiments can be used in research.
We use the pyramid as a visual representation of our work framework, which corresponds to the four roles that dry lab may assume during the research process:
Conduct a feasibility analysis of the proposal put forward by wet lab researchers, considering limited throughput and predictive capabilities.
Propose potentially feasible experimental approaches using computational methods.
Use modeling methods to compensate for the currently missing components of wet experiments, thereby forming a closed-loop project.
Propose novel design approaches for the current research topic and provide corresponding computational tool designs.
Molecular simulations for rational sequence screening
Overview of Computational High-Throughput Screening Model
Our team aims to design RNA riboswitch elements capable of precisely regulating intracellular S-adenosylmethionine (SAM) concentrations in hepatocytes. The designed riboswitch promotes gene expression in the absence of SAM and represses expression upon SAM binding through ligand-induced conformational rearrangement.
Riboswitch screening presents two major challenges. First, SAM-binding riboswitches exist in several distinct structural classes (SAM-I, SAM-III, SAM-IV, SAM-VI, etc.), each exhibiting unique recognition motifs and regulatory mechanisms. Second, the performance of riboswitches is highly sensitive to the composition and length of the connecting linker sequence between aptamer and expression platform1. Exhaustive experimental validation of all possible variants would be prohibitively time-consuming and resource-intensive. Therefore, integrating computational tools into the screening pipeline is essential.
To address this, we developed a high-throughput computational workflow that combines multiple molecular simulation and analysis tools. This workflow provides an efficient and cost-effective strategy for predicting riboswitch–ligand binding behavior, guiding experimental screening, and narrowing the search space for effective regulatory sequences.
Model Introduction
Currently, only limited computational tools can directly predict how RNA riboswitches recognize small molecules and how their conformations change upon ligand binding. Fully experimental screening of all possible RNA sequences is impractical; thus, our model bridges this gap by integrating molecular dynamics (MD) simulations with other techniques, molecular docking for instance.
Inspired by the 2024 iGEM Heidelberg workflow, we established a multi-stage computational protocol capable of simulating plasmid-encoded riboswitch constructs and evaluating both binding affinity and conformational dynamics of various candidate RNAs in complex with SAM. This enables large-scale, parallelized screening of riboswitch variants prior to experimental testing.
Our modeling pipeline consists of three major modules:
- Structure Preparation: generating accurate RNA and ligand structures and assigning charges.
- Molecular Dynamics Simulations: sampling conformational ensembles in explicit solvent to capture binding stability and dynamic transitions.
- Post-Simulation Analysis: quantifying binding energy, evaluating structural flexibility, hydrogen-bond networks, and visualizing conformational rearrangements. By following this workflow, we obtain predicted binding modes, binding free-energy estimates, and visual insights into RNA structural transitions. Representative examples are presented to illustrate the overall computational strategy.
Background Information
Core Software Ecosystem
Over the past decades, the field of molecular dynamics has matured into a robust ecosystem of rigorously validated computational tools. Among them, GROMACS, originally developed at the University of Groningen, stands out for its exceptional computational efficiency, rich suite of analysis utilities, and active open-source community. In our workflow, GROMACS performs all MD simulations — including energy minimization, equilibration, long-timescale sampling, and trajectory analysis.
To ensure reliability across the entire pipeline, from quantum-level parameter generation to trajectory visualization, we integrate several authoritative computational chemistry tools:
- Gaussian: an industry-standard quantum chemistry package widely used for electronic structure calculations. In our workflow, Gaussian performs geometry optimization and frequency analysis of small-molecule ligands at the DFT level, providing precise molecular geometries and electronic properties as the foundation for charge derivation and parameterization.
- Multiwfn: a versatile wavefunction analysis program used to extract electronic properties from quantum-chemical results. Specifically, we employ Multiwfn to generate RESP-fitted atomic charges from Gaussian outputs, ensuring accurate electrostatic representations and consistency with AMBER-compatible ligand parameterization schemes.
- subtop: a modern and flexible toolkit that automatically generates GROMACS-compatible topology and coordinate files from quantum-optimized structures. Compared with older converters such as ACPYPE, subtop offers enhanced AMBER-based parameterization support, robust atom-typing consistency, and seamless integration with charge-fitting workflows (RESP or AM1-BCC). It constructs topology files based on GAFF parameters, enabling smooth and reproducible incorporation of small molecules into RNA–ligand simulation systems.
- AutoDock / AutoDock Vina: classical molecular docking suites used to predict initial binding poses of SAM and its analogues within candidate riboswitch aptamers. These docking results serve as the starting configurations for molecular dynamics refinement, allowing us to explore the stability and adaptability of the RNA–ligand complexes.
- PyMOL: a comprehensive molecular visualization platform employed for structural inspection, alignment, and figure generation. Throughout the workflow, PyMOL facilitates the visual analysis of binding interactions and conformational changes, producing publication-quality graphics and comparative structure renderings.
Force Field Selection
Force fields define the mathematical potential functions and parameters describing atomic interactions — the physical foundation of MD simulations. Selecting a consistent and well-validated parameter set is crucial for accuracy.
For the RNA component, we employ the AMBER force-field family, specifically the ff99SB. These parameters accurately capture base stacking, hydrogen bonding, and backbone conformational preferences, making them widely adopted for RNA–ligand studies.
For small-molecule ligands, we use the General AMBER Force Field (GAFF), which is fully compatible with the AMBER framework and provides comprehensive coverage of common organic moieties. This compatibility ensures physically consistent interaction energies between RNA and ligand atoms.
The solvent is modeled with the TIP3P explicit water model, the standard companion of AMBER force fields. To mimic physiological ionic conditions, Mg²⁺ and Cl⁻ ions are included using AMBER-recommended parameters optimized for nucleic acid systems2. In particular, Mg²⁺ plays a critical role in stabilizing tertiary RNA structures and mediating SAM binding geometry.
User Manual
A detailed user manual accompanies this model, documenting all scripts, parameter files, and workflow steps required to reproduce or extend our simulations. The manual is designed as a practical guide for both beginners and advanced users seeking to apply molecular simulations to RNA regulatory element design.
Model Workflow
Structure Preparation
The first step of the computational workflow involves the prediction of riboswitch RNA tertiary structures. Although the release of AlphaFold 3 has greatly enhanced the accuracy of protein–ligand structure prediction, its performance on RNA molecules remains less reliable, particularly for complex tertiary folds or ligand-bound conformations. To address these limitations, we employ a hybrid strategy that combines both deep-learning-based prediction and secondary-structure-guided modeling to achieve more accurate RNA conformations. Specifically, two complementary approaches are used:
- trRosettaRNA2 + AlphaFold3 hybrid modeling: We utilize trRosettaRNA2, a deep-learning model specifically trained for RNA tertiary structure prediction, and integrate its output with AlphaFold3 predictions. The combination improves local base-pair geometry and global tertiary packing, especially in regions with stable base pairing such as the aptamer domain.
- Secondary-structure–guided 3D modeling: Inspired by experimental RNA folding approaches, we first predict the secondary structure using algorithms such as mFOLD and BPfold (implemented in our in-house RNA-Factory platform). The predicted dot-bracket representation is then submitted to RNA-Composer, which automatically constructs the corresponding 3D tertiary structure consistent with known RNA folding motifs.
Despite integrating multiple prediction methods, it is important to note that some specific RNA sequences—especially those with long, flexible linkers or noncanonical motifs—may still lack reliable tertiary models. Such cases are flagged for manual curation or excluded from downstream molecular dynamics (MD) simulations.
RNA Sequence of a Specific Riboswitch
According to previous studies, riboswitches typically reside in the 5′ untranslated region (UTR) of mRNA. The 5′ terminus is connected to a reporter gene (such as EGFP), while the 3′ terminus is linked to the downstream coding region. Between the aptamer domain and the expression platform, a linker sequence mediates signal transmission between ligand binding and gene expression regulation (Fig. 3A).
Crystal structures deposited in the Protein Data Bank (PDB) (e.g., SAM-VI riboswitch, Fig. 3B) reveal that the linker region often adopts a base-paired or partially helical conformation, stabilizing the communication between domains. Based on this observation, we prioritize candidate riboswitch structures whose predicted linkers exhibit reasonable base pairing, as this feature typically correlates with correct folding and functional responsiveness.

In practice, we first generate 3D models using AlphaFold3 or trRosettaRNA2, and verify the predicted base-pairing pattern within the linker. Models exhibiting plausible intramolecular interactions are selected for further MD refinement (Fig. 4).

For riboswitch sequences that cannot be reliably modeled by deep learning alone, we apply the secondary-structure–guided protocol. For instance, the SAM-III-2 riboswitch sequence is as follows:
GCUGGCGUCUCAGCAACCCGAAAGGAUGGCGGAAACGCCAGAUGCCUUGUAACCGAAAGGGGGGCGAAUGGGACGCAUGAGACGGGGACG
It was analyzed using BPfold, which generated the following dot-bracket representation:
....((((((((....(((..((((.(((((....)))))....))))....((....)).)))....))))))))......((....))
The predicted structure demonstrates a well-paired linker region (Fig. 5A), consistent with the expected conformational features. This secondary structure was subsequently used as input for RNA-Composer, yielding the corresponding 3D initial structure (Fig. 5B).

Binding Ligands
In addition to the RNA macromolecule, accurate preparation of small-molecule ligands is essential for reliable molecular dynamics simulations. The goal is to obtain physically meaningful geometries and electrostatic parameters. Our ligand preparation follows a standardized, multi-step quantum chemical protocol:
- Geometry Optimization and Frequency Analysis Performed using Gaussian at the DFT level (e.g., B3LYP/6-31G*), to obtain minimum-energy conformations and verify the absence of imaginary frequencies.
- Charge Derivation The RESP module in Multiwfn is used to compute electrostatic potential–fitted atomic charges based on the Gaussian output, ensuring a realistic charge distribution compatible with later simulations.
- Topology Generation Using subtop, we automatically generate GROMACS-compatible topology and coordinate files based on the GAFF parameter set, incorporating the RESP-derived charges.
Representative examples of the ligand structures used in our workflow include S-adenosylmethionine (SAM) and guanine, shown in their 2D chemical formulas and corresponding 3D geometries in Figures 6 and 7, respectively.


Molecular Dynamics Simulation
Although initial riboswitch conformations can be predicted computationally, these static structures do not necessarily represent stable states under physiological conditions. To obtain more realistic conformations, we first perform equilibration molecular dynamics (MD) simulations of the riboswitch alone in explicit solvent. The equilibrated RNA structures are then docked with the ligand SAM or guanine, followed by extended production simulations to explore the stability, conformational dynamics, and binding mechanisms of the riboswitch–ligand complexes4.
Simulation Parameters:
Accurate MD simulations require appropriate physical models (force fields) to describe atomic interactions within the system. In our workflow:
- Force Fields: The AMBER99SB parameter set is used for the RNA molecules, while the GAFF (General AMBER Force Field) is applied to the small-molecule ligands (SAM and guanine).
- Solvent Model: Explicit water molecules are represented using the TIP3P model, which is standard for AMBER-based simulations.
- Ionic Environment: To mimic the physiological ionic strength, Mg²⁺ ions (20 mM) are included to stabilize RNA tertiary structure, and Cl⁻ ions are added to maintain overall charge neutrality.
- Thermodynamic Conditions: The simulations are conducted at 310.15 K and 1 bar pressure, representing physiological conditions.
- Simulation Length: The initial equilibration simulation is run for 10 ns, followed by an extended 25 ns production simulation for the RNA–ligand complex.
These parameters provide a balanced compromise between computational efficiency and sufficient sampling for conformational relaxation and interaction stability assessment.
Initial Equilibrium:
The equilibration stage prepares the RNA riboswitch in a stable, solvated, and thermodynamically relaxed state prior to ligand binding. The stepwise procedure is as follows:
- Topology Generation: The RNA topology and coordinate files are generated using the pdb2gmx module in GROMACS, based on the selected AMBER99SB force field.
- Solvation: Using editconf and solvate, the RNA molecule is placed in a cubic simulation box, then filled with explicit TIP3P water molecules.
- Ion Addition: Counterions (Mg²⁺, Cl⁻) are added using gmx genion to neutralize system charge and reproduce the target ionic concentration.
- Energy Minimization: The system undergoes steepest-descent energy minimization to remove steric clashes and optimize the geometry, ensuring all atoms are positioned within physically reasonable force-field limits.
- Equilibration (NVT/NPT): Two equilibration phases are performed:
- NVT (constant number, volume, temperature): The system is heated gradually to 310 K using a velocity-rescaling thermostat.
- NPT (constant number, pressure, temperature): The system pressure is stabilized at 1 bar using a Parrinello–Rahman barostat, ensuring density and pressure equilibrium.
- Production Simulation: After equilibrium is achieved, a 10 ns production run is performed to monitor spontaneous relaxation of RNA secondary and tertiary structures.
As shown in Figure 9, the SAM-VI-8-15 riboswitch undergoes subtle but significant structural rearrangement during equilibration, indicating that the predicted static model relaxes into a more energetically stable conformation under physiological conditions.

Molecular Docking
To accurately locate the small-molecule ligand within the RNA binding pocket, molecular docking was performed using AutoDock and AutoDock Vina following the workflow below:
- Input Preparation: RNA and ligand structures were preprocessed with AutoDock Tools, where all polar hydrogens were added and Gasteiger charges were assigned. The processed structures were then converted into PDBQT format to ensure full compatibility with the docking procedure.
- Grid Box Definition: Docking grid parameters were defined around the known or predicted binding site, based on literature data or homologous riboswitch crystal structures.
- Docking Execution: AutoDock Vina performed an exhaustive conformational search to predict the most favorable ligand orientations within the binding pocket.
- Pose Evaluation and Selection: Docking results were ranked according to their binding energy scores, and the top-ranked poses were compared with known experimental structures (e.g., PDB ID: 6LAS) to confirm realistic binding orientations. The pose with the smallest RMSD relative to the reference structure was selected for the subsequent extended MD simulations.
This docking procedure ensures that the ligand begins MD simulations from a structurally and energetically plausible binding configuration, improving the reliability of dynamic analyses.

Extended MD Simulations
The extended simulation stage closely follows the equilibration protocol but is performed on the complete RNA–ligand complex to evaluate binding stability and conformational adaptation.
The workflow proceeds as follows:
- Topology Generation: Generate RNA topology with pdb2gmx and merge it with the ligand topology produced via subtop, ensuring GAFF–AMBER compatibility.
- System Construction: Define the simulation box and solvate the complex with TIP3P water using editconf and solvate.
- Ion Neutralization: Add Mg²⁺ and Cl⁻ ions to achieve electrostatic neutrality and physiological ionic strength.
- Energy Minimization: Minimize the complex to remove steric clashes, particularly at the RNA–ligand interface.
- Equilibration: Conduct NVT and NPT equilibration to stabilize temperature and pressure while maintaining the structural integrity of the complex.
- Production Simulation: Perform a 25 ns production run under constant temperature and pressure conditions, collecting trajectory data for subsequent structural and energetic analysis.
The post-simulation structures of the SAM-VI-8-15 riboswitch (Fig. 11) show minor but functionally relevant rearrangements, particularly in the regions highlighted by red boxes. These subtle local conformational shifts may modulate ligand-binding affinity and specificity, reflecting the dynamic adaptability inherent to riboswitch function.


Post-Simulation Analysis
After obtaining molecular dynamics trajectories for both the ligand-free and SAM-bound riboswitch systems, a series of post-simulation analyses were conducted to assess structural stability, conformational dynamics, and binding energetics. Because three-dimensional representations are often insufficient to intuitively capture base-pairing rearrangements, we further transformed the simulated 3D structures into two-dimensional secondary structure diagrams. This allowed direct visualization of hydrogen-bonding patterns between key nucleotides, facilitating communication with the experimental team. In addition, binding free energies between the riboswitch and SAM were computed as quantitative indicators of binding affinity, enabling efficient screening of multiple candidate riboswitches. Finally, principal component analysis (PCA) and free energy landscape (FEL) mapping were performed to explore global conformational changes and energy distributions during ligand binding.
RMSD Calculation
Root-mean-square deviation (RMSD) analysis was employed to evaluate the overall structural stability and equilibration of each simulated system. Backbone atom RMSDs were calculated over the trajectory to monitor structural deviations relative to the initial minimized structure. Equilibrium was considered achieved when RMSD fluctuations stabilized around a consistent mean value over time. Only equilibrated trajectory segments were used for subsequent energy and conformational analyses, ensuring that binding free energy calculations such as MM/PB(GB)SA were based on structurally stable and thermodynamically representative ensembles.
2D Structure Transformation
To visualize dynamic changes in base-pairing interactions, the simulated 3D RNA structures were converted into 2D secondary structure representations. This transformation was performed using x3dna-dssr, an open-source tool that extracts secondary structure annotations from 3D RNA models, followed by Forna, an online visualization server for dot-bracket representations. For example, the SAM VI-8-17 riboswitch trajectory was processed using the following command:
x3dna-dssr -i=VI-8-17.pdb
The resulting .dbn file contains the predicted dot-bracket notation, which was uploaded to Forna to generate an interactive 2D schematic (Figure 15). Comparison of pre- and post-binding diagrams clearly revealed the formation and disruption of key base pairs, providing direct visual evidence of ligand-induced conformational rearrangements within the aptamer domain.

Binding Energy Calculation and Energy Decomposition
To quantitatively estimate ligand–riboswitch binding affinities across multiple systems, the MM/PB(GB)SA methods were employed, offering a favorable balance between computational efficiency and accuracy7. All calculations were performed using the gmx_MMPBSA script developed by Jicun Li (available at gmxtools/gmx_mmpbsa/gmx_mmpbsa.bsh at master · Jerkwin/gmxtools · GitHub), with default parameters optimized for GROMACS trajectories. Execution of the script required only:
./gmx_mmpbsa.bsh
For the VI-8-17 system, the resulting _pid~MMPBSA.dat file reported a binding free energy of:
dG= -176.603( -210.576) kJ/mol = -42.209( -50.329) kcal/mol
Ki= 1.809E-24(3.436E-30) uM = 1.809E-21(3.436E-27) nM
Although absolute energy values may exhibit considerable uncertainty, an inherent limitation of continuum solvation models, the relative free energy differences among candidate riboswitches remain highly informative for distinguishing strong and weak binding systems.
The energy decomposition analysis further reveals the atomic details of the binding interaction6. The residue contribution data are extracted from the _pid~res.dat file:
MM-PBSA(with DH )
…
RNA 26G 0.977( 0.118)
RNA 27U 0.843( 0.104)
RNA 28G 0.656( 0.067)
RNA 29C 0.253( -0.010)
RNA 30C 0.065( -0.064)
RNA 31U -0.007( -0.151)
RNA 32C -0.125( -0.306)
RNA 33G -1.146( -1.103)
RNA 34C -2.758( -3.308)
RNA 35A -24.646( -24.547)
RNA 36U -23.141( -21.007)
…
Per-residue energy decomposition provided further insight into the molecular basis of SAM recognition. As the example above shows, residues A35 and U36 contributed disproportionately to the total binding free energy (−24.6 and −23.1 kcal·mol⁻¹, respectively), suggesting that these bases form key stabilizing interactions with the SAM ligand. Such residue-level insights help elucidate the molecular recognition features governing riboswitch specificity.
PCA & FEL Analysis
To characterize the dominant conformational motions underlying ligand binding, principal component analysis (PCA) was performed on the backbone atomic coordinates. The covariance matrix of atomic fluctuations was diagonalized, and the first few principal components (PCs) representing the most significant collective motions were extracted. For the SAM VI-8-15 riboswitch, PC1 and PC2 accounted for 43.9% and 14.9% of the total variance, respectively, with a cumulative contribution of 58.8%, indicating that these two modes effectively describe the major conformational transitions of the system (Figure 16).
Based on PCA projections, a free energy landscape (FEL) was constructed by mapping the conformational probability density along the PC1–PC2 coordinates and converting it into free energy values via the Boltzmann relation. As shown in Figure 17, distinct energy minima correspond to stable conformational basins, while energy barriers represent transition states separating metastable conformations. By analyzing the connectivity among these basins, the conformational transition pathways induced by SAM binding can be visualized, offering dynamic insight into the regulatory mechanisms of the riboswitch.
Results
Based on the computational workflow described above, we systematically performed molecular dynamics (MD) simulations on multiple RNA sequences provided by the wet-lab team. The resulting structural and energetic data were used to guide the design and optimization of riboswitch constructs. Among all candidates, some riboswitches, such as SAM VI-8-15, exhibited favorable performances. We further conducted detailed mechanistic studies on SAM VI-8-15 riboswitch to provide theoretical insights for future riboswitch design and optimization.
Screening of Riboswitches
Screening of Linker Lengths
At the early stage of riboswitch design, the wet-lab team observed that directly replacing the guanine-binding domain in a known riboswitch scaffold with a SAM-binding domain failed to produce the expected switching function. We hypothesized that this was due to improper structural coupling mediated by the linker sequence. To address this issue, we systematically screened various riboswitch types and linker-length combinations. A total of 11 SAM riboswitch variants and 2 literature-validated guanine riboswitches were provided by the wet-lab team. All constructs were evaluated following the established workflow. As shown in Figure 18, the top four candidates ranked by binding energy were SAM VI-4, SAM VI-6, SAM VI-8, and SAM III-2 (Note:Bars are missing for sequences that either did not yield reliable structures or whose binding energies are near zero.). These computational results were provided to the wet-lab team as key references for experimental screening.
Fine-Tuning of Linker Sequences
Experimental validation confirmed that SAM VI-6 and SAM VI-8 displayed excellent switching performance. To further optimize these candidates, we explored alternative linker base arrangements. In parallel, because the SAM III series also showed promising preliminary results (optimal activity with a 2-nt linker but a sharp decline at 6 nt), we performed a refined screening of linker lengths ranging from 1 to 4 nucleotides and tested multiple base-ordering combinations. Approximately 30 variants were designed and provided by the wet-lab team. We evaluated all candidates through the same computational pipeline. As summarized in Figure 19, the ten riboswitches with the most favorable binding free energies were VI-6-3, VI-6-4, VI-8-7, VI-8-9, VI-8-15, VI-8-16, III-2-2, III-3-1, III-4-2, and III-4-3 (Note:Bars are missing for sequences that either did not yield reliable structures or whose binding energies are near zero.). These results offered clear computational guidance for subsequent experimental optimization.
Mechanism Studies
Mechanism Studies:
Experimental validation ultimately identified SAM VI-8, SAM VI-8-15, and SAM VI-8-16 as the most effective constructs, each exhibiting robust on/off switching behavior in hepatocyte assays. Although mechanistic elucidation was not the primary focus of this study, we selected SAM VI-8-15 as a representative construct for detailed mechanistic investigation5.
We first analyzed the secondary structure of SAM VI-8-15 (Figure 20). A hydrogen bond was observed between nucleotides 38 and 43, while no other base pairs were disrupted or newly formed. This observation suggests that SAM likely interacts with the region surrounding these two nucleotides. Because this site is located relatively far from the linker region, such local interactions may induce long-range conformational rearrangements within the riboswitch, ultimately leading to its functional switching behavior.
We then analyzed the conformational transitions of SAM VI-8-15 before and after ligand binding. Principal component analysis (PCA) and free energy landscape (FEL) mapping (Figure 17) revealed several distinct energy basins corresponding to intermediate conformations. Comparison of the initial, intermediate, and final states demonstrated a pronounced folding rearrangement within the aptamer domain, accompanied by significant dynamic fluctuations in the 3'-terminal region during the simulation (Figure 21B).
To identify residues contributing most to ligand stabilization, we performed MMPBSA free energy decomposition analysis (Figure 22). The results indicated that A35, A36, U42, C43, and C44 contributed most significantly to the overall binding free energy.
Structural visualization in PyMOL further revealed the detailed interaction network (Figure 23):
- The adenine moiety of SAM forms one hydrogen bond with U44 and another with the 4'-hydroxyl group of C43;
- The 2'-hydroxyl group of SAM’s ribose interacts with the N1 atom of A35;
- The terminal carboxyl and amino groups of SAM form a tri-hydrogen-bond network with the phosphate group of U42.
These cooperative interactions jointly stabilize the SAM–RNA binding interface and are likely critical for the switching mechanism of the SAM VI-8-15 riboswitch.

Conclusion
By integrating existing computational tools and modeling approaches, we have successfully established an efficient and reliable workflow for high-throughput screening of riboswitches. This workflow enables rapid determination of RNA structural features, accurate estimation of binding affinities with SAM, and detailed structural insights, including free energy landscapes, conformational dynamics, and binding patterns. Importantly, our computational predictions showed strong agreement with experimental results, effectively guiding wet-lab experiments and accelerating the overall research process.
Despite these successes, we recognize several limitations of the current model. First, the absolute binding free energy values often deviate significantly from experimental measurements, indicating that further improvements in computational accuracy are needed. Second, although the workflow has been carefully optimized, the computational cost and time required remain high due to the large size and complexity of riboswitch systems. These limitations reflect inherent challenges in current molecular modeling methods.
Another particular challenge lies in obtaining accurate initial RNA structures. While our multi-tool integration combined with manual validation helps address this issue, the predictive ability remains limited for highly complex or unconventional RNA sequences. Some RNAs may not adopt a clearly defined reasonable conformation, adding extra difficulty to structure prediction and downstream simulations.
At the same time, these challenges present important opportunities for future development. Our team is actively working on next-generation machine learning models aimed at improving the accuracy of RNA structure and interaction predictions. With ongoing advancements in AlphaFold3 and the emergence of RNA-specific prediction tools, we anticipate that the computational workflow will continue to evolve. Ultimately, this approach will provide a faster, more efficient, and user-friendly platform for high-throughput riboswitch screening, offering stronger computational support for synthetic biology and drug design applications.
Deep Learning for RNA-Target Interaction Prediction
To better interpret the binding patterns between RNA and its targets, and to guide the subsequent experimental design, we introduced deep learning approaches in our modeling work. Specifically, we employed open-source predictive models to study RNA–protein interactions and RNA–small molecule interactions. These computational tools provide a cost-efficient and scalable strategy to prioritize candidate sequences before wet-lab validation.
Currently, there is a lack of large-scale datasets that directly support RNA function prediction, and our experimental throughput does not allow us to train such a model from scratch. Therefore, we focused on predicting RNA–target binding affinity, as binding is the necessary prerequisite for any biological function of RNA (e.g., riboswitch activation/inactivation). Binding predictions thus serve as a useful proxy for RNA functional potential.
1. RNA–Protein Interaction Prediction
We developed DeepRPI, a deep learning framework for predicting RNA–protein interactions.
- Input representation: RNA sequences were encoded using RNA-BERT27, which captures contextual and structural information from RNA sequences, while protein sequences were encoded using ESM-228, a powerful protein language model trained at the evolutionary scale.
- Interaction modeling: The embeddings were passed into a cross-attention module to capture potential binding sites and interaction patterns between RNA and proteins.
- Output prediction: The cross-attention output was fed into a multilayer perceptron (MLP) classifier to produce a binary prediction: binding or non-binding.
Evaluations on public databases29 demonstrated competitive predictive performance.


In our project, DeepRPI mainly served as an initial scoring system for candidate RNA sequences, helping us narrow down design space before conducting experiments. We have also integrated the DeepRPI model into our RNA design software, enabling more effective identification of RNA–protein interactions within our design pipeline.
2. RNA-Small Molecule Interaction Prediction
In addition to processing protein targets, we decided to apply SMRTnet30, a newly developed deep learning framework designed to predict RNA-small molecule interactions (SRIs) without requiring RNA tertiary structure information, which makes it highly suitable for drug discovery and riboswitch design.
SMRTnet integrates multiple neural components into a multimodal framework:
Two large language models (LLMs) are used to encode RNA sequences and small molecules separately, capturing contextual and chemical features. Convolutional neural networks (CNNs) extract local patterns from RNA secondary structures. Graph attention networks (GATs) process molecular graphs of small molecules, learning atom-level dependencies. Finally, an attention-based multimodal fusion module integrates RNA and small molecule representations, enabling precise prediction of binding affinities.
This hybrid architecture allows SMRTnet to achieve state-of-the-art performance across multiple benchmarks, significantly outperforming existing SRI prediction tools.
During our iHP activities, a PhD student recommended SMRTnet to us as a promising tool for RNA–small molecule prediction. Inspired by this suggestion, we employed the model to virtually screen our designed RNA sequences. The model's predictions aligned well with experimental outcomes. Specifically, MAT1a-16 RNA was consistently ranked as the top candidate and subsequently demonstrated the best performance in wet-lab validation, confirming the utility of SMRTnet as a guiding tool for RNA design.
3. Summary
In this work, the RNA–protein interaction predictor (DeepRPI) and the RNA–small molecule interaction predictor (SMRTnet) together form the computational backbone of our RNA design pipeline. Each tool addresses a different aspect of RNA functionality and provides complementary insights for sequence optimization. Ultimately, we placed more emphasis on RNA–small molecule predictions for riboswitch design, but both approaches highlight the potential of deep learning as a valuable tool in RNA engineering.
Modeling of Riboswitch Based Drug Delivery
The ultimate objective of this project is to achieve controllable drug delivery in humans via an RNA riboswitch (hereafter “riboswitch”) encapsulated in lipid-based nanoparticles (LNPs), so as to modulate the concentration of S-adenosylmethionine (SAM) in the liver of hepatitis patients from a pathological level (≈ 40 µM) back to the normal physiological level (≈ 60 µM). Because animal experiments are not permitted, we develop an ordinary differential equation (ODE) model to simulate the human pharmacokinetics (PK) and pharmacodynamics (PD) of this strategy and thereby test its feasibility in silico.
The model comprises three levels:
- A physiologically based pharmacokinetic (PBPK) model that simulates the distribution of LNP-riboswitch from intravenous injection to accumulation in peri-hepatocyte regions.
- A cellular model that represents LNP transport from the vasculature into hepatocytes, intracellular release of the riboswitch and subsequent SAM expression.
- Multi-dosage simulations to evaluate whether repeated dosing can keep intrahepatic SAM concentration stably at the normal level over long periods.
Part 1. Physiologically Based Pharmacokinetic (PBPK) Model
Model Description:
After an intravenous injection, LNP-riboswitch particles circulate and distribute among major organs; the model represents the following compartments: lung (LU), heart (HT), kidney (KI), spleen (SP), small intestine (SI), liver (LI), arterial blood (AB), venous blood (VB) and the rest of body (ROB)31. Each tissue compartment is described by a tissue concentration C_i (units: µmol·L⁻¹, or µM) and exchanges mass with the arterial or venous blood according to perfusion and partitioning.
Theoretical Formulation and Model Parameters
Initial Conditions:
At t = 0, the body contains no LNP-riboswitch, and a single intravenous dose of 100 μmol is administered.
Model Equations:
1. Lung
Explanation:
Drug concentration in the lung changes due to the balance between venous blood inflow and lung tissue redistribution. Blood delivers drug into the lung, while tissue releases drug back according to its partition coefficient, then enters the arterial system.
Parameters:
:Lung LNP-riboswitch concentration (µM) :Venous blood LNP-riboswitch concentration (µM) :Total cardiac output, taken as 5.1 L·min⁻¹ :Lung volume, taken as 0.5 L :Lung tissue-to-blood partition coefficient, assumed to be 0.5
2. Heart
Explanation:
Heart drug levels are governed by arterial supply and myocardial uptake. Arterial blood delivers drug, tissue absorbs it per the partition coefficient, and excess blood returns to systemic circulation.
Parameters:
:Heart LNP-riboswitch concentration (µM) :Arterial blood LNP-riboswitch concentration (µM) :Heart blood flow, taken as 0.25 L·min⁻¹ :Heart volume, taken as 0.33 L :Heart tissue-to-blood partition coefficient, assumed to be 0.5
3. Kidney
Explanation:
Kidney drug concentration changes result from arterial input and tissue redistribution. Arterial blood delivers drug, kidney tissue absorbs it according to its partition coefficient, and remaining drug returns via renal veins.
Parameters:
:Kidney LNP-riboswitch concentration (µM) :Arterial blood LNP-riboswitch concentration (µM) :Kidney blood flow, taken as 1.1 L·min⁻¹ :Kidney volume, taken as 0.31 L :Kidney tissue-to-blood partition coefficient, assumed to be 0.8
4. Spleen
Explanation:
Spleen drug levels depend on arterial inflow and tissue uptake. Blood delivers drug, tissue absorbs it via its partition coefficient, and residual blood exits toward the portal vein or systemic circulation.
Parameters:
:Spleen LNP-riboswitch concentration (µM) :Spleen blood flow, taken as 0.2 L·min⁻¹ :Spleen volume, taken as 0.4 L :Spleen tissue-to-blood partition coefficient, assumed to be 5.0
5. Small Intestine
Explanation:
Drug concentration in the small intestine is determined by mesenteric arterial delivery and tissue uptake. Blood supplies drug, intestinal tissue absorbs it per its partition coefficient, then blood flows to the liver via the portal vein.
Parameters:
:Small intestine LNP-riboswitch concentration (µM) :Arterial blood LNP-riboswitch concentration (µM) :Small intestine blood flow, taken as 0.85 L·min⁻¹ :Small intestine volume, taken as 0.64 L :Small intestine tissue-to-blood partition coefficient, assumed to be 0.5
6. Liver
Explanation:
Liver drug concentration changes are controlled by arterial and portal inflow, tissue uptake, and hepatic clearance. Arterial and portal blood deliver drug; the liver absorbs it based on the partition coefficient, metabolizes a fraction, and the remaining drug returns via hepatic veins.
Parameters:
:Liver LNP-riboswitch concentration (µM) :Liver arterial blood flow, assumed to be 0.35 L·min⁻¹ under pathological conditions :Liver volume, assumed to be 1.8 L under pathological conditions :Liver tissue-to-blood partition coefficient, assumed to be 15 :Fraction of portal blood bypassing the liver, assumed to be 0.3 :Hepatic clearance, assumed to be 0.6 L·min⁻¹ under pathological conditions
7. Venous Blood
Explanation:
Venous blood drug levels reflect the net input from all tissues minus pulmonary extraction. It collects returning drug from heart, kidney, liver, spleen, intestine, and other tissues before entering the lungs.
Parameters:
:Rest of body LNP-riboswitch concentration (µM) :Venous blood volume, taken as 1.8 L :Blood flow to the rest of the body, taken as 2.35 L·min⁻¹
8. Arterial Blood
Explanation:
Arterial drug concentration is determined by pulmonary output and distribution to systemic tissues. Blood from the lungs delivers drug into arteries, which then supply all organs.
Parameters:
:Arterial blood volume, taken as 1.1 L
9. Rest of Body
Explanation:
Drug concentration in the remaining tissues depends on arterial inflow and tissue redistribution. This compartment represents all organs not explicitly modeled, contributing significantly to overall distribution kinetics.
Parameters:
:Volume of the rest of the body, taken as 53.12 L :Tissue-to-blood partition coefficient for the rest of the body, assumed to be 0.3
Simulation Results:
The above system of ordinary differential equations was numerically solved using MATLAB to simulate the dynamic distribution of the LNP-riboswitch in the human body. The simulation results are as follows:
After intravenous injection, the LNP-riboswitch initially distributes rapidly into the pulmonary circulation and arterial system. Subsequently, its concentration in the lungs and arterial blood declines quickly, indicating that the drug is being widely distributed to tissues throughout the body. Among all organs, the liver shows a pronounced drug accumulation effect, with its concentration reaching a steady state of approximately 20 µM within about 30 minutes, which is significantly higher than in other tissues. These results confirm that the LNP-riboswitch exhibits effective liver-targeted distribution, providing a key pharmacokinetic foundation for subsequent intracellular drug release and functional expression.
PBPK Model Simulation Code
clear;
clc;
close all;
% time span
tspan = [0 80];
% Initial Condition
C0 = zeros(9,1);
Dose = 100; %μmol
V_VB = 1.8; %venous blood(L)
C0(1) = Dose / V_VB;
%solve ODE
[t, C] = ode15s(@odefun_human_hepatitis_final, tspan, C0);
title('PBPK Simulation for INP Injection','FontSize', 14, 'FontWeight', 'bold')
xlabel('Time(min)')
ylabel('Concentration(μM)')
plot(t,C(:,1),'DisplayName','Venous Blood','LineWidth',2);hold on;
plot(t,C(:,2),'DisplayName','Lung','Linewidth',2);
plot(t, C(:,3), 'DisplayName', 'Arterial Blood', 'LineWidth', 2);
plot(t, C(:,4), 'DisplayName', 'Heart', 'LineWidth', 2);
plot(t, C(:,5), 'DisplayName', 'Kidney', 'LineWidth', 2);
plot(t, C(:,6), 'DisplayName', 'Liver', 'LineWidth', 2);
plot(t, C(:,7), 'DisplayName', 'Spleen', 'LineWidth', 2);
plot(t, C(:,8), 'DisplayName', 'Small Intestine', 'LineWidth', 2);
plot(t, C(:,9), 'DisplayName', 'Rest of Body', 'LineWidth', 2);
xlabel('Time(s)');
ylabel('Concentration(μM)');
legend('show')
grid on
% ODE Function
function dCdt = odefun_human_hepatitis_final(t, C)
Fs = 0.3;
CL_HE = 0.6;
% Volumn (L)
V_VB = 1.8; V_LU = 0.5; V_AB = 1.1;
V_HT = 0.33; V_KI = 0.31; V_LI = 1.8;
V_SP = 0.4; V_SI = 0.64; V_ROB = 53.12;
% Blood Flow (L/min)
Q_HT = 0.25; Q_KI = 1.1; Q_LI = 0.35;
Q_SP = 0.2; Q_SI = 0.85; Q_ROB = 2.35;
Q_CO = 5.1;
% Organ-to-blood distribution coefficient (Kp)
Kp_LU = 0.5; Kp_HT = 0.5; Kp_KI = 0.8;
Kp_LI = 15; Kp_SP = 5; Kp_SI = 0.5;
Kp_ROB = 0.3;
CVB = C(1); CLU = C(2); CAB = C(3); CHT = C(4); CKI = C(5);
CLI = C(6); CSP = C(7); CSI = C(8); CROB = C(9);
Amount_out_HT = Q_HT * (CHT / Kp_HT);
Amount_out_KI = Q_KI * (CKI / Kp_KI);
Amount_out_ROB = Q_ROB * (CROB / Kp_ROB);
Amount_from_portal_organs = Q_SP * (CSP / Kp_SP) + Q_SI * (CSI / Kp_SI);
Amount_shunted_to_vein = Fs * Amount_from_portal_organs;
Amount_to_liver_via_portal = (1 - Fs) * Amount_from_portal_organs;
Blood_flow_out_of_liver = Q_LI + (1-Fs)*(Q_SP + Q_SI);
Amount_out_of_liver = Blood_flow_out_of_liver * (CLI / Kp_LI);
dCVB = (1/V_VB) * ( Amount_out_HT + Amount_out_KI + Amount_out_ROB + ...
Amount_out_of_liver + Amount_shunted_to_vein - Q_CO * CVB );
dCLU = (Q_CO / V_LU) * (CVB - CLU / Kp_LU);
dCAB = (Q_CO / V_AB) * (CLU / Kp_LU - CAB);
dCHT = (Q_HT / V_HT) * (CAB - CHT / Kp_HT);
dCKI = (Q_KI / V_KI) * (CAB - CKI / Kp_KI);
dCSP = (Q_SP / V_SP) * (CAB - CSP / Kp_SP);
dCSI = (Q_SI / V_SI) * (CAB - CSI / Kp_SI);
dCROB = (Q_ROB / V_ROB) * (CAB - CROB / Kp_ROB);
Amount_to_liver_via_artery = Q_LI * CAB;
Total_amount_into_liver = Amount_to_liver_via_artery + Amount_to_liver_via_portal;
Amount_cleared_in_liver = CL_HE * (CLI / Kp_LI);
dCLI = (1/V_LI) * (Total_amount_into_liver - Amount_out_of_liver - Amount_cleared_in_liver);
dCdt = [dCVB; dCLU; dCAB; dCHT; dCKI; dCLI; dCSP; dCSI; dCROB];
end
Part 2. Cellular Model
Model Description:
After completing the PBPK simulation of the LNP-riboswitch distribution throughout the body, this study further established a model for its transmembrane transport and functional expression within hepatocytes. This process involves multiple consecutive biological steps37,38.
The intracellular delivery and functional realization of the LNP-riboswitch begins with its binding to apolipoprotein E (ApoE) in the blood circulation to form a complex. This complex then crosses the vascular endothelium and specifically binds to receptors on the surface of hepatocytes. Subsequently, the complex is internalized via endocytosis, forming early endosomes, during which ApoE dissociates. Once inside the intracellular trafficking pathway, part of the LNP is sorted into the exocytosis pathway and eventually expelled from the cell, while the remaining portion is transported to late endosomes as the endosomes mature. In the late endosome stage, a fraction of the LNP successfully escapes into the cytoplasm, releasing the riboswitch, which then initiates translation of the target gene. Finally, the generated S-adenosylmethionine (SAM) exerts negative feedback inhibition on the riboswitch activity, thereby achieving precise regulation of its own concentration.
Theoretical Formulation and Model Parameters
Initial Conditions:
The initial conditions are similar to those of the PBPK model. It is assumed that no LNP-riboswitch is present in the body prior to administration. At t = 0, a single intravenous bolus dose is given instantaneously, with a dose of 100 µmol.
Model Equations:
1. LNP Binding to ApoE
Explanation:
This equation describes the dynamic binding of LNP to ApoE protein in the liver microcirculation. The process follows Michaelis-Menten kinetics, while also considering the natural dissociation of the complex and its internalization by hepatocytes. Together, these factors determine the instantaneous concentration of the LNP-ApoE complex in the blood.
Parameters:
:Concentration of the LNP-ApoE complex (µM) :LNP concentration in blood near the liver (µM) :ApoE concentration in blood near the liver, assumed to be 0.8 µM :Association rate constant between ApoE and LNP, assumed to be 20 h⁻¹ :Dissociation rate constant of the ApoE-LNP complex, assumed to be 0.02 h⁻¹ :Endocytosis rate of LNP-ApoE complex, assumed to be 0.4 h⁻¹ :ApoE binding saturation constant, assumed to be 0.05 µM
2. LNP Endocytosis and Endosomal Transport
Explanation:
These equations describe the intracellular transport of LNP after endocytosis. LNP in early endosomes (EE) can either be transported to late endosomes (LE) or be exocytosed. LNP in late endosomes faces three possible fates: transport to lysosomes for degradation, exocytosis, or successful escape into the cytoplasm. Only the fraction that escapes into the cytoplasm can release functional riboswitch.
Parameters:
:LNP concentration in early endosomes (µM) :LNP concentration in late endosomes (µM) :LNP concentration in lysosomes (µM) :Rate of LNP escape from LE to cytoplasm, assumed to be 0.3 h⁻¹ :Degradation rate of LNP in lysosomes, assumed to be 2.0 h⁻¹ :Exocytosis rate from EE, assumed to be 0.05 h⁻¹ :Exocytosis rate from LE, assumed to be 0.3 h⁻¹ :Transport rate from EE to LE, assumed to be 2.0 h⁻¹ :Transport rate from LE to lysosome, assumed to be 0.05 h⁻¹
3. Exocytosis Pathway
Explanation:
This equation quantifies the dynamic changes of LNP expelled from the cell via exocytosis. The process includes contributions from both early and late endosome exocytosis, as well as systemic clearance of the exocytosed LNP.
Parameters:
:Concentration of LNP expelled via exocytosis (µM) :Clearance rate of exocytosed LNP, assumed to be 0.5 h⁻¹
4. Riboswitch Release and Target Expression
Explanation:
These equations describe the dynamics of the riboswitch in the cytoplasm and its regulation of SAM concentration. The riboswitch concentration is determined by release from LNP, natural degradation, and SAM-mediated inactivation. SAM concentration is controlled by basal production, riboswitch-driven synthesis, and metabolic consumption. Riboswitch expression activity is negatively regulated by SAM via a Hill-type feedback mechanism.
Parameters:
:Concentration of riboswitch in cytoplasm (µM) :SAM concentration (µM) :Rate of riboswitch release from LNP, assumed to be 2.2 h⁻¹ :Rate of riboswitch inactivation by SAM, assumed to be 0.2 µM⁻¹·h⁻¹ :Basal SAM production rate, calculated as 0.088 µM·h⁻¹ :Functional consumption rate of SAM, assumed to be 0.002 µM·h⁻¹ :SAM feedback half-saturation constant, 60 µM :Riboswitch natural degradation rate, assumed to be 0.1 h⁻¹ :SAM natural degradation rate, assumed to be 0.004 h⁻¹ :Riboswitch-mediated SAM synthesis rate, assumed to be 2 µM·h⁻¹ :Hill coefficient, assumed to be 4
Simulation Results:
By integrating the above cellular model into the PBPK framework and performing numerical simulation using MATLAB, the following results were obtained:
After a single dose, the intracellular concentration of free riboswitch reaches its peak (~1.4 µM) at approximately 8 hours, then gradually declines due to degradation and SAM-mediated feedback inhibition. At the same time, intracellular SAM concentration peaks at around 20 hours after administration, approaching the target therapeutic level of 60 µM. However, it subsequently shows a slow but continuous decline, eventually returning to the pathological level (~40 µM). These results indicate that a single dose can temporarily increase SAM concentration in hepatocytes, but cannot maintain long-term stable therapeutic effects. Therefore, a multiple-dose strategy is required to achieve sustained regulation of SAM concentration.
Cellular Model Simulation Code
clear;
clc;
close all;
%PBPK Model
pbpk_tspan = [0 500*60];
pbpk_C0 = zeros(9,1);
pbpk_Dose = 100; %(μmol)
pbpk_V_VB = 1.8;
pbpk_C0(1) = pbpk_Dose / pbpk_V_VB;
[t_pbpk_min, C_pbpk] = ode15s(@odefun_human_hepatitis_final, pbpk_tspan, pbpk_C0);
t_pbpk_hr = t_pbpk_min / 60;
LNP_in_liver_profile = C_pbpk(:, 6);
LNP_driver_func = griddedInterpolant(t_pbpk_hr, LNP_in_liver_profile, 'pchip');
% Parameters
params = struct();
params.k_on = 20; params.k_off = 0.02; params.KM = 0.05; params.ApoE = 0.8;
params.k_int = 0.4; params.k_EE_LE = 2.0; params.k_exo_EE = 0.05;
params.k_LE_Ly = 0.05; params.k_escape = 0.3; params.k_exo_LE = 0.3;
params.k_lysis = 2.0;
params.k_clear_exo = 0.5;
params.k_rel = 2.2;
params.gamma_R = 0.1;
params.k_inact = 0.2;
params.alpha = 2;
params.gamma_SAM = 0.004;
params.k_util = 0.002;
params.K_SAM = 60;
params.n = 4;
params.SAM_setpoint = 40; % concentration of SAM in hepatitis patients
params.k_prod_SAM = (params.gamma_SAM + params.k_util) * params.SAM_setpoint; % (h^-1 * uM) = uM/h
% Initial Condition
Y0_cell = [0; 0; 0; 0; 0; params.SAM_setpoint; 0];
tspan_cell = [0, 500];
%Solve ODE
[t_cell, Y_cell] = ode15s(@(t,Y) lnp_cellular_ode_v8(t, Y, params, LNP_driver_func), tspan_cell, Y0_cell);
%Plot
figure('Name', 'Model with SAM Homeostasis', 'Position', [100, 75, 1000, 700]);
labels = {'LNP in Liver', 'LNP-ApoE Complex','LNP in EE','LNP in LE','LNP in Lysosome', 'Riboswitch','SAM', 'Exocytosed LNP'};
subplot(4,2,1);
plot(t_pbpk_hr*60, LNP_in_liver_profile, 'LineWidth', 2);
title(labels{1}); xlabel('Time (min)'); ylabel('Conc. (μM)'); xlim([0,120]);ylim([0,25]); grid on;
subplot(4,2,2);
plot(t_cell, Y_cell(:,1), 'LineWidth', 2);
title(labels{2}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,1)),max(Y_cell(:,1))*1.1]);grid on;
subplot(4,2,3);
plot(t_cell, Y_cell(:,2), 'LineWidth', 2);
title(labels{3}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,2)),max(Y_cell(:,2))*1.1]);grid on;
subplot(4,2,4);
plot(t_cell, Y_cell(:,3), 'LineWidth', 2);
title(labels{4}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,3)),max(Y_cell(:,3))*1.1]);grid on;
subplot(4,2,5);
plot(t_cell, Y_cell(:,4), 'LineWidth', 2);
title(labels{5}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,4)),max(Y_cell(:,4))*1.1]);grid on;
subplot(4,2,6);
plot(t_cell, Y_cell(:,7), 'LineWidth', 2);
title(labels{8}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,7)),max(Y_cell(:,7))*1.1]);grid on;
subplot(4,2,7);
plot(t_cell, Y_cell(:,5), 'LineWidth', 2);
title(labels{6}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,5)),max(Y_cell(:,5))*1.1]);grid on;
subplot(4,2,8);
plot(t_cell, Y_cell(:,6), 'LineWidth', 2);
title(labels{7}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,500]);ylim([min(Y_cell(:,6))-5,max(Y_cell(:,6))*1.1]);grid on;
hold on; yline(params.SAM_setpoint, 'r--', 'LineWidth', 1.5); legend('SAM Conc.', 'Baseline (40 μM)', 'Location', 'northeast'); hold off;
sgtitle('LNP-SAM Cellular Dynamics ', 'FontSize', 14, 'FontWeight', 'bold');
% PBPK Model
function dCdt = odefun_human_hepatitis_final(~, C)
Fs=0.3; CL_HE=0.6; V_VB=1.8; V_LU=0.5; V_AB=1.1; V_HT=0.33; V_KI=0.31; V_LI=1.8; V_SP=0.4; V_SI=0.64; V_ROB=53.12;
Q_HT=0.25; Q_KI=1.1; Q_LI=0.35; Q_SP=0.2; Q_SI=0.85; Q_ROB=2.35; Q_CO=5.1; Kp_LU=0.5; Kp_HT=0.5; Kp_KI=0.8; Kp_LI=15;
Kp_SP=5; Kp_SI=0.5; Kp_ROB=0.3; CVB=C(1); CLU=C(2); CAB=C(3); CHT=C(4); CKI=C(5); CLI=C(6); CSP=C(7); CSI=C(8); CROB=C(9);
Amount_out_HT=Q_HT*(CHT/Kp_HT); Amount_out_KI=Q_KI*(CKI/Kp_KI); Amount_out_ROB=Q_ROB*(CROB/Kp_ROB);
Amount_from_portal_organs=Q_SP*(CSP/Kp_SP)+Q_SI*(CSI/Kp_SI); Amount_shunted_to_vein=Fs*Amount_from_portal_organs;
Amount_to_liver_via_portal=(1-Fs)*Amount_from_portal_organs; Blood_flow_out_of_liver=Q_LI+(1-Fs)*(Q_SP+Q_SI);
Amount_out_of_liver=Blood_flow_out_of_liver*(CLI/Kp_LI);
dCVB=(1/V_VB)*(Amount_out_HT+Amount_out_KI+Amount_out_ROB+Amount_out_of_liver+Amount_shunted_to_vein-Q_CO*CVB);
dCLU=(Q_CO/V_LU)*(CVB-CLU/Kp_LU); dCAB=(Q_CO/V_AB)*(CLU/Kp_LU-CAB); dCHT=(Q_HT/V_HT)*(CAB-CHT/Kp_HT);
dCKI=(Q_KI/V_KI)*(CAB-CKI/Kp_KI); dCSP=(Q_SP/V_SP)*(CAB-CSP/Kp_SP); dCSI=(Q_SI/V_SI)*(CAB-CSI/Kp_SI);
dCROB=(Q_ROB/V_ROB)*(CAB-CROB/Kp_ROB); Amount_to_liver_via_artery=Q_LI*CAB;
Total_amount_into_liver=Amount_to_liver_via_artery+Amount_to_liver_via_portal;
Amount_cleared_in_liver=CL_HE*(CLI/Kp_LI);
dCLI=(1/V_LI)*(Total_amount_into_liver-Amount_out_of_liver-Amount_cleared_in_liver);
dCdt=[dCVB;dCLU;dCAB;dCHT;dCKI;dCLI;dCSP;dCSI;dCROB];
end
% Cell Model
function dYdt = lnp_cellular_ode_v8(t, Y, p, lnp_driver)
L_blood_liver = lnp_driver(t);
L_ApoE = Y(1); L_EE = Y(2); L_LE = Y(3); L_Ly = Y(4);
R = Y(5); SAM = Y(6); L_exo = Y(7);
dL_ApoE = (p.k_on * L_blood_liver * p.ApoE) / (p.KM + L_blood_liver) - p.k_off * L_ApoE - p.k_int * L_ApoE;
dL_EE = p.k_int * L_ApoE - (p.k_EE_LE + p.k_exo_EE) * L_EE;
dL_LE = p.k_EE_LE * L_EE - (p.k_LE_Ly + p.k_escape + p.k_exo_LE) * L_LE;
dL_Ly = p.k_LE_Ly * L_LE - p.k_lysis * L_Ly;
dR = p.k_rel * p.k_escape * L_LE - p.gamma_R * R - p.k_inact * R * SAM;
dL_exo = (p.k_exo_EE * L_EE + p.k_exo_LE * L_LE) - p.k_clear_exo * L_exo;
hill_inhibition = 1 / (1 + (SAM / p.K_SAM)^p.n);
dSAM = p.k_prod_SAM + p.alpha * R * hill_inhibition - (p.gamma_SAM + p.k_util) * SAM;
dYdt = [dL_ApoE; dL_EE; dL_LE; dL_Ly; dR; dSAM; dL_exo];
end
Part 3. Multiple Dosage
Model Description:
Based on the previously established PBPK and cellular models, a single administration of LNP-riboswitch is insufficient to maintain long-term stable SAM concentrations in hepatocytes. To achieve sustained therapeutic effects, a multiple-dosage model was developed to evaluate the dynamic response and stability of SAM concentrations under a periodic dosing treatment.
Initial Conditions:
The initial conditions are consistent with the PBPK model, assuming no LNP-riboswitch is present in the body prior to administration. The first dose is delivered intravenously as an instantaneous injection at t = 0. Subsequent doses are also 100 μmol each, administered at 3.5-day intervals (84 hours). The simulation covers multiple dosing cycles within this schedule.
Simulation Results:
By integrating the PBPK and cellular models and performing numerical simulations in MATLAB for the multiple-dosing regimen, the following results were obtained:
During the first two dosing cycles, the peak concentration of SAM in hepatocytes gradually increased. From the third dose onward, SAM peak concentrations reached a dynamic equilibrium, and the pharmacokinetic fluctuations within each cycle became consistent. Quantitative analysis indicates that this multiple-dosing scheme maintains the average SAM concentration in hepatocytes at approximately 60.3 μM, precisely meeting the predetermined therapeutic threshold.
Multiple Dosing Simulation Code
clear;
clc;
close all;
%Dosage
dosing_interval = 84; % Dosing Interval
total_time = 1000;
pbpk_Dose = 100; % Dosage Administration(μmol)
target_SAM = 60; % Target SAM Concentration(μM)
%PBPK Model
pbpk_V_VB = 1.8;
pbpk_tspan_min = [0, total_time*60];
dosing_times_min = (0:dosing_interval:total_time)*60;
%Initial Condition
C0 = zeros(9,1);
t_pbpk_min_all = [];
C_pbpk_all = [];
for i = 1:length(dosing_times_min)
current_time = dosing_times_min(i);
C0(1) = C0(1) + pbpk_Dose / pbpk_V_VB;
if i < length(dosing_times_min)
t_segment = [current_time, dosing_times_min(i+1)];
else
t_segment = [current_time, pbpk_tspan_min(2)];
end
[t_seg, C_seg] = ode15s(@odefun_human_hepatitis_final, t_segment, C0);
if i > 1
t_seg = t_seg(2:end);
C_seg = C_seg(2:end, :);
end
t_pbpk_min_all = [t_pbpk_min_all; t_seg];
C_pbpk_all = [C_pbpk_all; C_seg];
C0 = C_seg(end, :)';
end
[t_pbpk_min_all, unique_indices] = unique(t_pbpk_min_all);
C_pbpk_all = C_pbpk_all(unique_indices, :);
t_pbpk_hr = t_pbpk_min_all / 60;
LNP_in_liver_profile = C_pbpk_all(:, 6);
LNP_driver_func = griddedInterpolant(t_pbpk_hr, LNP_in_liver_profile, 'pchip');
% Parameters
params = struct();
params.k_on = 20; params.k_off = 0.02; params.KM = 0.05; params.ApoE = 0.8;
params.k_int = 0.4; params.k_EE_LE = 2.0; params.k_exo_EE = 0.05;
params.k_LE_Ly = 0.05; params.k_escape = 0.3; params.k_exo_LE = 0.3;
params.k_lysis = 2.0;
params.k_clear_exo = 0.5;
params.k_rel = 2.2;
params.gamma_R = 0.1;
params.k_inact = 0.2;
params.alpha = 2;
params.gamma_SAM = 0.004;
params.k_util = 0.002;
params.K_SAM = 60;
params.n = 4;
params.SAM_setpoint = 40;
params.k_prod_SAM = (params.gamma_SAM + params.k_util) * params.SAM_setpoint;
% Initial Condition
Y0_cell = [0; 0; 0; 0; 0; params.SAM_setpoint; 0];
% Time Span
tspan_cell = [0, total_time];
% Solve ODE
[t_cell, Y_cell] = ode15s(@(t,Y) lnp_cellular_ode_v8(t, Y, params, LNP_driver_func), tspan_cell, Y0_cell);
%Plot
figure('Name', 'Multiple Dosing Simulation', 'Position', [100, 75, 1000, 700]);
subplot(4,2,1);
y_values = pbpk_Dose*ones(size(dosing_times_min));
stem(dosing_times_min/60, y_values, 'LineWidth', 1.5, 'Marker', 'none');
title('Dosing');
xlabel('Time (h)');
ylabel('Dose(μmol)');
xlim([0, total_time]);
ylim([0,110]);
grid on;
subplot(4,2,2);
plot(t_pbpk_hr, LNP_in_liver_profile, 'LineWidth', 2);
title('LNP in Liver');
xlabel('Time (h)');
ylabel('Conc. (μM)');
xlim([0, total_time]);
ylim([0,max(C_pbpk_all(:, 6))*1.1]);
grid on;
labels = {'LNP-ApoE Complex','LNP in EE','LNP in LE','LNP in Lysosome', 'Riboswitch','SAM', 'LNP Exocytosed'};
for i = 1:7
subplot(4,2,i+2);
plot(t_cell, Y_cell(:,i), 'LineWidth', 1.5);
title(labels{i});
xlabel('Time (h)');
ylabel('Conc. (μM)');
xlim([0, total_time]);
if max(abs(Y_cell(:,i))) > 1e-9
ylim_min = min(Y_cell(:,i));
ylim_max = max(Y_cell(:,i));
if ylim_min == ylim_max
ylim_max = ylim_min + 1;
end
ylim_range = ylim_max - ylim_min;
ylim([ylim_min-0.05*ylim_range, ylim_max+0.05*ylim_range]);
else
ylim([-0.1, 1]);
end
% Mark Initial Concentration of SAM and Target Concentration
if i == 6
hold on;
yline(target_SAM, 'g--', 'LineWidth', 2, 'DisplayName', 'Target (60μM)');
yline(params.SAM_setpoint, 'r--', 'LineWidth', 1.5, 'DisplayName', 'Baseline (40μM)');
legend('SAM Conc.', 'Target(60 μM)','Initial Conc.');
hold off;
% Average Concentration
last_100h_idx = t_cell > (total_time - 100);
if any(last_100h_idx)
mean_SAM = mean(Y_cell(last_100h_idx, 6));
text(0.2*total_time, 0.70*max(ylim), ...
sprintf('Avg SAM Conc.: %.1fμM\n', mean_SAM), ...
'FontSize', 8);
end
end
grid on;
end
sgtitle('Multiple Dosing Simulation', 'FontSize', 14, 'FontWeight', 'bold');
% PBPK Model
function dCdt = odefun_human_hepatitis_final(~, C)
Fs=0.3; CL_HE=0.6; V_VB=1.8; V_LU=0.5; V_AB=1.1; V_HT=0.33; V_KI=0.31; V_LI=1.8; V_SP=0.4; V_SI=0.64; V_ROB=53.12;
Q_HT=0.25; Q_KI=1.1; Q_LI=0.35; Q_SP=0.2; Q_SI=0.85; Q_ROB=2.35; Q_CO=5.1; Kp_LU=0.5; Kp_HT=0.5; Kp_KI=0.8; Kp_LI=15;
Kp_SP=5; Kp_SI=0.5; Kp_ROB=0.3; CVB=C(1); CLU=C(2); CAB=C(3); CHT=C(4); CKI=C(5); CLI=C(6); CSP=C(7); CSI=C(8); CROB=C(9);
Amount_out_HT=Q_HT*(CHT/Kp_HT); Amount_out_KI=Q_KI*(CKI/Kp_KI); Amount_out_ROB=Q_ROB*(CROB/Kp_ROB);
Amount_from_portal_organs=Q_SP*(CSP/Kp_SP)+Q_SI*(CSI/Kp_SI); Amount_shunted_to_vein=Fs*Amount_from_portal_organs;
Amount_to_liver_via_portal=(1-Fs)*Amount_from_portal_organs; Blood_flow_out_of_liver=Q_LI+(1-Fs)*(Q_SP+Q_SI);
Amount_out_of_liver=Blood_flow_out_of_liver*(CLI/Kp_LI);
dCVB=(1/V_VB)*(Amount_out_HT+Amount_out_KI+Amount_out_ROB+Amount_out_of_liver+Amount_shunted_to_vein-Q_CO*CVB);
dCLU=(Q_CO/V_LU)*(CVB-CLU/Kp_LU); dCAB=(Q_CO/V_AB)*(CLU/Kp_LU-CAB); dCHT=(Q_HT/V_HT)*(CAB-CHT/Kp_HT);
dCKI=(Q_KI/V_KI)*(CAB-CKI/Kp_KI); dCSP=(Q_SP/V_SP)*(CAB-CSP/Kp_SP); dCSI=(Q_SI/V_SI)*(CAB-CSI/Kp_SI);
dCROB=(Q_ROB/V_ROB)*(CAB-CROB/Kp_ROB); Amount_to_liver_via_artery=Q_LI*CAB;
Total_amount_into_liver=Amount_to_liver_via_artery+Amount_to_liver_via_portal;
Amount_cleared_in_liver=CL_HE*(CLI/Kp_LI);
dCLI=(1/V_LI)*(Total_amount_into_liver-Amount_out_of_liver-Amount_cleared_in_liver);
dCdt=[dCVB;dCLU;dCAB;dCHT;dCKI;dCLI;dCSP;dCSI;dCROB];
end
% Cell Model
function dYdt = lnp_cellular_ode_v8(t, Y, p, lnp_driver)
L_blood_liver = lnp_driver(t);
L_ApoE = Y(1); L_EE = Y(2); L_LE = Y(3); L_Ly = Y(4);
R = Y(5); SAM = Y(6); L_exo = Y(7);
dL_ApoE = (p.k_on * L_blood_liver * p.ApoE) / (p.KM + L_blood_liver) - p.k_off * L_ApoE - p.k_int * L_ApoE;
dL_EE = p.k_int * L_ApoE - (p.k_EE_LE + p.k_exo_EE) * L_EE;
dL_LE = p.k_EE_LE * L_EE - (p.k_LE_Ly + p.k_escape + p.k_exo_LE) * L_LE;
dL_Ly = p.k_LE_Ly * L_LE - p.k_lysis * L_Ly;
dR = p.k_rel * p.k_escape * L_LE - p.gamma_R * R - p.k_inact * R * SAM;
dL_exo = (p.k_exo_EE * L_EE + p.k_exo_LE * L_LE) - p.k_clear_exo * L_exo;
hill_inhibition = 1 / (1 + (SAM / p.K_SAM)^p.n);
dSAM = p.k_prod_SAM + p.alpha * R * hill_inhibition - (p.gamma_SAM + p.k_util) * SAM;
dYdt = [dL_ApoE; dL_EE; dL_LE; dL_Ly; dR; dSAM; dL_exo];
end
Conclusion
Using the established ODE-based model, numerical simulations demonstrate that the LNP-delivered riboswitch system, when combined with a well-designed multiple-dosing strategy, can effectively maintain SAM concentrations in human hepatocytes at normal physiological levels (around 60 μM). This provides a theoretical and simulation-based foundation for achieving long-term stable treatment of hepatitis.
Next Step—Rational Design of RNA
Background and Motivation
Inspired by the recent widespread advances in protein design, we aim to establish a novel functional nucleic acid design paradigm for the de novo design of aptamers or riboswitches.
The protein design approach that is currently popular and well-validated involves dividing the process into two main stages: scaffold design and sequence design. These stages employ distinct models.
Given the functional similarity between nucleic acid aptamers and protein binders, our aim is to develop software tools for nucleic acid aptamer design based on this workflow. This will make a valuable contribution to the field of novel, rational nucleic acid design. The recent breakthroughs achieved by the Baker lab in the field of RNA design demonstrate the considerable feasibility and promising prospects of deep learning-based de novo RNA design.42
Direction and Design
We began by developing a sequence design model for nucleic acids, leading to the creation of RNAMPNN—a deep learning-based sequence design tool.
RNAMPNN, as a sequence design model, represents the adaptation of ProteinMPNN43—a commonly used sequence design model in the field of protein design—to RNA design.
Although RNAMPNN shares similar application scenarios and tasks with ProteinMPNN, the RNA systems it targets exhibit distinct differences from protein systems, presenting certain challenges for our task, such as:
Data scarcity: The main challenge in designing models for RNA systems, as opposed to protein systems, is the limited amount of available data. Currently, publicly available RNA structural data constitutes less than one percent of protein structural data.
Structural instability: Compared to proteins, RNA structures exhibit high flexibility, with some RNAs potentially lacking thermodynamically stable conformations. This poses significant challenges to the robustness of sequence-design models.
Main chain complexity: RNA backbone structures often require a greater number of atoms to be defined than proteins do. This increases both the amount of topological atomic information that the model must interpret and the computational efficiency required.
RNAMPNN broadly follows the fundamental architecture of ProteinMPNN, employing a graph neural network based on message passing as its core model. It incorporates novel mechanisms tailored to the unique properties of RNA, such as simplifying the definition of the RNA backbone for existing sequence design models44 by reducing it to seven atoms, thereby alleviating the computational burden. A distinctive pre-post feature fusion mechanism enables the model to extract additional residue-sequential information beyond purely Euclidean spatial data.
RNAMPNN primarily consists of four components:
Feature Extraction Module: Used to extract nucleotide-level geometric information from raw atomic coordinates, such as bond lengths, bond angles, and dihedral angles between atoms.
MPNN module: Used for message passing of geometric information at the nucleotide level and feature assignment to nucleotide nodes.
Pre/Post-Fusion Module: Used to fuse node features before and after the MPNN module based on the self-attention mechanism, enabling the model to capture information from nucleotide sequence structures.
Output Head Module: Used to convert extracted node information into discrete nucleotide types.
Results and Optimization
The RNAMPNN model was trained on over 2,000 publicly available RNA structural data points collected by our team and tested on a dataset from the same source that we segmented independently.
During model testing, we observed that RNAMPNN performs better with RNA structures containing a greater number of nucleotides, whereas its performance is suboptimal with fewer nucleotides. This is due to RNAMPNN's primary reliance on message passing to extract information from the atomic coordinate space of the RNA backbone. In systems with fewer nucleotides, the message passing mechanism becomes less effective due to the limited number of nodes.
Additionally, we found that RNAMPNN is not very robust. Fine-tuning a pre-trained RNAMPNN model on isolated short sequence datasets results in a significant decline in the model's overall performance.
We implemented a series of improvements to the original RNAMPNN architecture and training method, some of which proved effective and were incorporated into the latest release, for example:
- We initially attempted to blend atomic geometric features using a standalone atomic-scale MPNN before applying the nucleotide-scale MPNN. However, this approach ultimately proved ineffective and significantly increased computational time and memory consumption, leading us to abandon this design.
- We then tried using an XGBoost45 model instead of the traditional MLP as the output head. Specifically, we first performed globally differentiable pre-training with an MLP as the output head. Next, we used the main body of the RNAMPNN as a representation model to assign features to nucleotide nodes. Finally, we used a separately trained XGBoost model to predict node types based on these features. This enhancement was effective because the additional computational cost was concentrated during the training phase, delivering significant performance gains for the model. This design was ultimately integrated into the model's release version.


- We attempted to introduce Gaussian noise during training to enhance model stability, but this approach did not yield the expected results. We believe this may stem from Gaussian noise being an inappropriate choice for the actual physical RNA system. A further avenue to explore is employing molecular dynamics techniques to incorporate multi-conformation data of RNA into the dataset.
Conclusions and Plans
In this section, we attempted to transfer existing protein design workflows to the field of RNA design, successfully developing the RNAMPNN sequence design model in the process. We hope that this model will provide valuable insights for developing subsequent models.
Due to the time constraints of the iGEM project timeline, we were unable to complete the required RNA scaffold condition generation model — a deep learning model for designing RNA scaffolds based on targets. Currently, there is no publicly available model for performing scaffold generation tasks for small-molecule nucleic acid aptamers. This makes it difficult to validate the model experimentally.
To make it easier for other researchers to use and validate our model, we have integrated it into our user-friendly RNA comprehensive analysis platform, RNA-Factory. Users only need to provide a PDB file via the GUI interface to obtain model outputs.
As the RNAMPNN model is not perfect, we are planning to implement targeted improvements.
Unlike ProteinMPNN, RNAMPNN43 cannot account for multi-stranded RNA, nor can it consider ligand atoms explicitly, as LigandMPNN46 can. These limitations could be overcome by adapting models from protein design research.
The scarcity of data remains a significant challenge. We plan to augment our dataset by incorporating data generated through a combination of nucleic acid structure prediction models and molecular dynamics simulations.
Cost calculation is also a noteworthy concern. We have found that RNAMPNN currently uses a lot of GPU memory and takes a long time to perform inference, which is a disadvantage for designing large-scale RNA sequences. This issue could be resolved by simplifying the model architecture.
At the time of writing this Wiki, the RNAMPNN model is still being optimised and updated, and the dataset for its accompanying skeleton generation model is being collected and constructed.We sincerely hope that, in the future, deep learning models will play an increasingly significant role in the rational and de novo design of RNA. We also anticipate that the patterns we have identified will be the subject of broader research and validation.