Skip to content

Model

Introduction—The Basic Framework for Our Work

As computational techniques and mathematical modelling methods become increasingly indispensable in the biological sciences, the PekingHSC 2025 team intends to explore the full potential of dry-lab experiments in this competition. We intend to use various computational analysis and mathematical modelling approaches to investigate the different ways in which dry-lab experiments can be used in research.

We use the pyramid as a visual representation of our work framework, which corresponds to the four roles that dry lab may assume during the research process:

Molecular Dynamics
DeepRPI, SMRTnet
ODE
RNAMPNN

Conduct a feasibility analysis of the proposal put forward by wet lab researchers, considering limited throughput and predictive capabilities.

Propose potentially feasible experimental approaches using computational methods.

Use modeling methods to compensate for the currently missing components of wet experiments, thereby forming a closed-loop project.

Propose novel design approaches for the current research topic and provide corresponding computational tool designs.

Molecular simulations for rational sequence screening

Overview of Computational High-Throughput Screening Model

Our team aims to design RNA riboswitch elements capable of precisely regulating intracellular S-adenosylmethionine (SAM) concentrations in hepatocytes. The designed riboswitch promotes gene expression in the absence of SAM and represses expression upon SAM binding through ligand-induced conformational rearrangement.

Riboswitch screening presents two major challenges. First, SAM-binding riboswitches exist in several distinct structural classes (SAM-I, SAM-III, SAM-IV, SAM-VI, etc.), each exhibiting unique recognition motifs and regulatory mechanisms. Second, the performance of riboswitches is highly sensitive to the composition and length of the connecting linker sequence between aptamer and expression platform1. Exhaustive experimental validation of all possible variants would be prohibitively time-consuming and resource-intensive. Therefore, integrating computational tools into the screening pipeline is essential.

To address this, we developed a high-throughput computational workflow that combines multiple molecular simulation and analysis tools. This workflow provides an efficient and cost-effective strategy for predicting riboswitch–ligand binding behavior, guiding experimental screening, and narrowing the search space for effective regulatory sequences.

Model Introduction

Currently, only limited computational tools can directly predict how RNA riboswitches recognize small molecules and how their conformations change upon ligand binding. Fully experimental screening of all possible RNA sequences is impractical; thus, our model bridges this gap by integrating molecular dynamics (MD) simulations with other techniques, molecular docking for instance.

Inspired by the 2024 iGEM Heidelberg workflow, we established a multi-stage computational protocol capable of simulating plasmid-encoded riboswitch constructs and evaluating both binding affinity and conformational dynamics of various candidate RNAs in complex with SAM. This enables large-scale, parallelized screening of riboswitch variants prior to experimental testing.

Our modeling pipeline consists of three major modules:

  1. Structure Preparation: generating accurate RNA and ligand structures and assigning charges.
  2. Molecular Dynamics Simulations: sampling conformational ensembles in explicit solvent to capture binding stability and dynamic transitions.
  3. Post-Simulation Analysis: quantifying binding energy, evaluating structural flexibility, hydrogen-bond networks, and visualizing conformational rearrangements. By following this workflow, we obtain predicted binding modes, binding free-energy estimates, and visual insights into RNA structural transitions. Representative examples are presented to illustrate the overall computational strategy.

Figure 1. Computational Workflow of High-throughput Screening of Riboswitches

Background Information

Core Software Ecosystem

Over the past decades, the field of molecular dynamics has matured into a robust ecosystem of rigorously validated computational tools. Among them, GROMACS, originally developed at the University of Groningen, stands out for its exceptional computational efficiency, rich suite of analysis utilities, and active open-source community. In our workflow, GROMACS performs all MD simulations — including energy minimization, equilibration, long-timescale sampling, and trajectory analysis.

To ensure reliability across the entire pipeline, from quantum-level parameter generation to trajectory visualization, we integrate several authoritative computational chemistry tools:

  • Gaussian: an industry-standard quantum chemistry package widely used for electronic structure calculations. In our workflow, Gaussian performs geometry optimization and frequency analysis of small-molecule ligands at the DFT level, providing precise molecular geometries and electronic properties as the foundation for charge derivation and parameterization.
  • Multiwfn: a versatile wavefunction analysis program used to extract electronic properties from quantum-chemical results. Specifically, we employ Multiwfn to generate RESP-fitted atomic charges from Gaussian outputs, ensuring accurate electrostatic representations and consistency with AMBER-compatible ligand parameterization schemes.
  • subtop: a modern and flexible toolkit that automatically generates GROMACS-compatible topology and coordinate files from quantum-optimized structures. Compared with older converters such as ACPYPE, subtop offers enhanced AMBER-based parameterization support, robust atom-typing consistency, and seamless integration with charge-fitting workflows (RESP or AM1-BCC). It constructs topology files based on GAFF parameters, enabling smooth and reproducible incorporation of small molecules into RNA–ligand simulation systems.
  • AutoDock / AutoDock Vina: classical molecular docking suites used to predict initial binding poses of SAM and its analogues within candidate riboswitch aptamers. These docking results serve as the starting configurations for molecular dynamics refinement, allowing us to explore the stability and adaptability of the RNA–ligand complexes.
  • PyMOL: a comprehensive molecular visualization platform employed for structural inspection, alignment, and figure generation. Throughout the workflow, PyMOL facilitates the visual analysis of binding interactions and conformational changes, producing publication-quality graphics and comparative structure renderings.

Force Field Selection

Force fields define the mathematical potential functions and parameters describing atomic interactions — the physical foundation of MD simulations. Selecting a consistent and well-validated parameter set is crucial for accuracy.

For the RNA component, we employ the AMBER force-field family, specifically the ff99SB. These parameters accurately capture base stacking, hydrogen bonding, and backbone conformational preferences, making them widely adopted for RNA–ligand studies.

For small-molecule ligands, we use the General AMBER Force Field (GAFF), which is fully compatible with the AMBER framework and provides comprehensive coverage of common organic moieties. This compatibility ensures physically consistent interaction energies between RNA and ligand atoms.

The solvent is modeled with the TIP3P explicit water model, the standard companion of AMBER force fields. To mimic physiological ionic conditions, Mg²⁺ and Cl⁻ ions are included using AMBER-recommended parameters optimized for nucleic acid systems2. In particular, Mg²⁺ plays a critical role in stabilizing tertiary RNA structures and mediating SAM binding geometry.

User Manual

A detailed user manual accompanies this model, documenting all scripts, parameter files, and workflow steps required to reproduce or extend our simulations. The manual is designed as a practical guide for both beginners and advanced users seeking to apply molecular simulations to RNA regulatory element design.

View User Manual of Molecular Dynamics Simulation

Model Workflow

Structure Preparation

Figure 2. Structure Preparation

The first step of the computational workflow involves the prediction of riboswitch RNA tertiary structures. Although the release of AlphaFold 3 has greatly enhanced the accuracy of protein–ligand structure prediction, its performance on RNA molecules remains less reliable, particularly for complex tertiary folds or ligand-bound conformations. To address these limitations, we employ a hybrid strategy that combines both deep-learning-based prediction and secondary-structure-guided modeling to achieve more accurate RNA conformations. Specifically, two complementary approaches are used:

  1. trRosettaRNA2 + AlphaFold3 hybrid modeling: We utilize trRosettaRNA2, a deep-learning model specifically trained for RNA tertiary structure prediction, and integrate its output with AlphaFold3 predictions. The combination improves local base-pair geometry and global tertiary packing, especially in regions with stable base pairing such as the aptamer domain.
  2. Secondary-structure–guided 3D modeling: Inspired by experimental RNA folding approaches, we first predict the secondary structure using algorithms such as mFOLD and BPfold (implemented in our in-house RNA-Factory platform). The predicted dot-bracket representation is then submitted to RNA-Composer, which automatically constructs the corresponding 3D tertiary structure consistent with known RNA folding motifs.

Despite integrating multiple prediction methods, it is important to note that some specific RNA sequences—especially those with long, flexible linkers or noncanonical motifs—may still lack reliable tertiary models. Such cases are flagged for manual curation or excluded from downstream molecular dynamics (MD) simulations.

RNA Sequence of a Specific Riboswitch

According to previous studies, riboswitches typically reside in the 5′ untranslated region (UTR) of mRNA. The 5′ terminus is connected to a reporter gene (such as EGFP), while the 3′ terminus is linked to the downstream coding region. Between the aptamer domain and the expression platform, a linker sequence mediates signal transmission between ligand binding and gene expression regulation (Fig. 3A).

Crystal structures deposited in the Protein Data Bank (PDB) (e.g., SAM-VI riboswitch, Fig. 3B) reveal that the linker region often adopts a base-paired or partially helical conformation, stabilizing the communication between domains. Based on this observation, we prioritize candidate riboswitch structures whose predicted linkers exhibit reasonable base pairing, as this feature typically correlates with correct folding and functional responsiveness.

Image 1
Figure 3. Structure of the SAM VI riboswitch. (A) Schematic illustration; (B) Crystal structure (PDB ID: 6LAS).

In practice, we first generate 3D models using AlphaFold3 or trRosettaRNA2, and verify the predicted base-pairing pattern within the linker. Models exhibiting plausible intramolecular interactions are selected for further MD refinement (Fig. 4).

Image 1
Figure 4. Predicted structure of SAM VI-8-17 by trRosettaRNA2. (A) 3D model; (B) Corresponding 2D structure showing correctly paired linker region.

For riboswitch sequences that cannot be reliably modeled by deep learning alone, we apply the secondary-structure–guided protocol. For instance, the SAM-III-2 riboswitch sequence is as follows:

python
GCUGGCGUCUCAGCAACCCGAAAGGAUGGCGGAAACGCCAGAUGCCUUGUAACCGAAAGGGGGGCGAAUGGGACGCAUGAGACGGGGACG

It was analyzed using BPfold, which generated the following dot-bracket representation:

python
....((((((((....(((..((((.(((((....)))))....))))....((....)).)))....))))))))......((....))

The predicted structure demonstrates a well-paired linker region (Fig. 5A), consistent with the expected conformational features. This secondary structure was subsequently used as input for RNA-Composer, yielding the corresponding 3D initial structure (Fig. 5B).

Image 1
Figure 5. Structure of SAM-III-2. (A) 2D structure predicted by BPfold; (B) 3D model generated by RNA-Composer.

Binding Ligands

In addition to the RNA macromolecule, accurate preparation of small-molecule ligands is essential for reliable molecular dynamics simulations. The goal is to obtain physically meaningful geometries and electrostatic parameters. Our ligand preparation follows a standardized, multi-step quantum chemical protocol:

  1. Geometry Optimization and Frequency Analysis Performed using Gaussian at the DFT level (e.g., B3LYP/6-31G*), to obtain minimum-energy conformations and verify the absence of imaginary frequencies.
  2. Charge Derivation The RESP module in Multiwfn is used to compute electrostatic potential–fitted atomic charges based on the Gaussian output, ensuring a realistic charge distribution compatible with later simulations.
  3. Topology Generation Using subtop, we automatically generate GROMACS-compatible topology and coordinate files based on the GAFF parameter set, incorporating the RESP-derived charges.

Representative examples of the ligand structures used in our workflow include S-adenosylmethionine (SAM) and guanine, shown in their 2D chemical formulas and corresponding 3D geometries in Figures 6 and 7, respectively.

Image 1
Figure 6. Chemical structure of S-adenosyl methionine (SAM). (A) 2D structure; (B) 3D conformation.
Image 1
Figure 7. Chemical structure of guanine. (A) 2D structure; (B) 3D conformation.
Molecular Dynamics Simulation

Figure 8. Molecular Dynamics Simulation

Although initial riboswitch conformations can be predicted computationally, these static structures do not necessarily represent stable states under physiological conditions. To obtain more realistic conformations, we first perform equilibration molecular dynamics (MD) simulations of the riboswitch alone in explicit solvent. The equilibrated RNA structures are then docked with the ligand SAM or guanine, followed by extended production simulations to explore the stability, conformational dynamics, and binding mechanisms of the riboswitch–ligand complexes4.

Simulation Parameters:

Accurate MD simulations require appropriate physical models (force fields) to describe atomic interactions within the system. In our workflow:

  • Force Fields: The AMBER99SB parameter set is used for the RNA molecules, while the GAFF (General AMBER Force Field) is applied to the small-molecule ligands (SAM and guanine).
  • Solvent Model: Explicit water molecules are represented using the TIP3P model, which is standard for AMBER-based simulations.
  • Ionic Environment: To mimic the physiological ionic strength, Mg²⁺ ions (20 mM) are included to stabilize RNA tertiary structure, and Cl⁻ ions are added to maintain overall charge neutrality.
  • Thermodynamic Conditions: The simulations are conducted at 310.15 K and 1 bar pressure, representing physiological conditions.
  • Simulation Length: The initial equilibration simulation is run for 10 ns, followed by an extended 25 ns production simulation for the RNA–ligand complex.

These parameters provide a balanced compromise between computational efficiency and sufficient sampling for conformational relaxation and interaction stability assessment.

Initial Equilibrium:

The equilibration stage prepares the RNA riboswitch in a stable, solvated, and thermodynamically relaxed state prior to ligand binding. The stepwise procedure is as follows:

  1. Topology Generation: The RNA topology and coordinate files are generated using the pdb2gmx module in GROMACS, based on the selected AMBER99SB force field.
  2. Solvation: Using editconf and solvate, the RNA molecule is placed in a cubic simulation box, then filled with explicit TIP3P water molecules.
  3. Ion Addition: Counterions (Mg²⁺, Cl⁻) are added using gmx genion to neutralize system charge and reproduce the target ionic concentration.
  4. Energy Minimization: The system undergoes steepest-descent energy minimization to remove steric clashes and optimize the geometry, ensuring all atoms are positioned within physically reasonable force-field limits.
  5. Equilibration (NVT/NPT): Two equilibration phases are performed:
    • NVT (constant number, volume, temperature): The system is heated gradually to 310 K using a velocity-rescaling thermostat.
    • NPT (constant number, pressure, temperature): The system pressure is stabilized at 1 bar using a Parrinello–Rahman barostat, ensuring density and pressure equilibrium.
  6. Production Simulation: After equilibrium is achieved, a 10 ns production run is performed to monitor spontaneous relaxation of RNA secondary and tertiary structures.

As shown in Figure 9, the SAM-VI-8-15 riboswitch undergoes subtle but significant structural rearrangement during equilibration, indicating that the predicted static model relaxes into a more energetically stable conformation under physiological conditions.

Image 1
Figure 9. Structural changes of SAM VI-8-15 before (in magenta) and after Initial Equilibrium (in cyan)

Molecular Docking

To accurately locate the small-molecule ligand within the RNA binding pocket, molecular docking was performed using AutoDock and AutoDock Vina following the workflow below:

  1. Input Preparation: RNA and ligand structures were preprocessed with AutoDock Tools, where all polar hydrogens were added and Gasteiger charges were assigned. The processed structures were then converted into PDBQT format to ensure full compatibility with the docking procedure.
  2. Grid Box Definition: Docking grid parameters were defined around the known or predicted binding site, based on literature data or homologous riboswitch crystal structures.
  3. Docking Execution: AutoDock Vina performed an exhaustive conformational search to predict the most favorable ligand orientations within the binding pocket.
  4. Pose Evaluation and Selection: Docking results were ranked according to their binding energy scores, and the top-ranked poses were compared with known experimental structures (e.g., PDB ID: 6LAS) to confirm realistic binding orientations. The pose with the smallest RMSD relative to the reference structure was selected for the subsequent extended MD simulations.

This docking procedure ensures that the ligand begins MD simulations from a structurally and energetically plausible binding configuration, improving the reliability of dynamic analyses.

Image 1
Figure 10. Three potential SAM-binding sites in SAM VI-8-17. Aptamer shown in magenta, linker in grey, remaining nucleotides in cyan

Extended MD Simulations

The extended simulation stage closely follows the equilibration protocol but is performed on the complete RNA–ligand complex to evaluate binding stability and conformational adaptation.

The workflow proceeds as follows:

  1. Topology Generation: Generate RNA topology with pdb2gmx and merge it with the ligand topology produced via subtop, ensuring GAFF–AMBER compatibility.
  2. System Construction: Define the simulation box and solvate the complex with TIP3P water using editconf and solvate.
  3. Ion Neutralization: Add Mg²⁺ and Cl⁻ ions to achieve electrostatic neutrality and physiological ionic strength.
  4. Energy Minimization: Minimize the complex to remove steric clashes, particularly at the RNA–ligand interface.
  5. Equilibration: Conduct NVT and NPT equilibration to stabilize temperature and pressure while maintaining the structural integrity of the complex.
  6. Production Simulation: Perform a 25 ns production run under constant temperature and pressure conditions, collecting trajectory data for subsequent structural and energetic analysis.

The post-simulation structures of the SAM-VI-8-15 riboswitch (Fig. 11) show minor but functionally relevant rearrangements, particularly in the regions highlighted by red boxes. These subtle local conformational shifts may modulate ligand-binding affinity and specificity, reflecting the dynamic adaptability inherent to riboswitch function.

Image 1
Figure 11. Structural changes of SAM VI-8-15 before (in cyan) and after (in orange) Extended MD Simulation
Image 1
Figure 12. MD trajectory of SAM VI-8-15 (gray) and SAM (magenta), showing structural changes.
Post-Simulation Analysis

Figure 13. Post-Simulation Analysis

After obtaining molecular dynamics trajectories for both the ligand-free and SAM-bound riboswitch systems, a series of post-simulation analyses were conducted to assess structural stability, conformational dynamics, and binding energetics. Because three-dimensional representations are often insufficient to intuitively capture base-pairing rearrangements, we further transformed the simulated 3D structures into two-dimensional secondary structure diagrams. This allowed direct visualization of hydrogen-bonding patterns between key nucleotides, facilitating communication with the experimental team. In addition, binding free energies between the riboswitch and SAM were computed as quantitative indicators of binding affinity, enabling efficient screening of multiple candidate riboswitches. Finally, principal component analysis (PCA) and free energy landscape (FEL) mapping were performed to explore global conformational changes and energy distributions during ligand binding.

RMSD Calculation

Root-mean-square deviation (RMSD) analysis was employed to evaluate the overall structural stability and equilibration of each simulated system. Backbone atom RMSDs were calculated over the trajectory to monitor structural deviations relative to the initial minimized structure. Equilibrium was considered achieved when RMSD fluctuations stabilized around a consistent mean value over time. Only equilibrated trajectory segments were used for subsequent energy and conformational analyses, ensuring that binding free energy calculations such as MM/PB(GB)SA were based on structurally stable and thermodynamically representative ensembles.

Figure 14. RMSD of SAM VI-8-18 during Extended MD Simulation

2D Structure Transformation

To visualize dynamic changes in base-pairing interactions, the simulated 3D RNA structures were converted into 2D secondary structure representations. This transformation was performed using x3dna-dssr, an open-source tool that extracts secondary structure annotations from 3D RNA models, followed by Forna, an online visualization server for dot-bracket representations. For example, the SAM VI-8-17 riboswitch trajectory was processed using the following command:

python
x3dna-dssr -i=VI-8-17.pdb

The resulting .dbn file contains the predicted dot-bracket notation, which was uploaded to Forna to generate an interactive 2D schematic (Figure 15). Comparison of pre- and post-binding diagrams clearly revealed the formation and disruption of key base pairs, providing direct visual evidence of ligand-induced conformational rearrangements within the aptamer domain.

Image 1
Figure 15. Structural changes of SAM VI-8-17 before (A) and after (B) binding to SAM
Binding Energy Calculation and Energy Decomposition

To quantitatively estimate ligand–riboswitch binding affinities across multiple systems, the MM/PB(GB)SA methods were employed, offering a favorable balance between computational efficiency and accuracy7. All calculations were performed using the gmx_MMPBSA script developed by Jicun Li (available at gmxtools/gmx_mmpbsa/gmx_mmpbsa.bsh at master · Jerkwin/gmxtools · GitHub), with default parameters optimized for GROMACS trajectories. Execution of the script required only:

bash
./gmx_mmpbsa.bsh

For the VI-8-17 system, the resulting _pid~MMPBSA.dat file reported a binding free energy of:

bash
 dG=         -176.603( -210.576) kJ/mol =   -42.209(  -50.329) kcal/mol
 Ki=        1.809E-24(3.436E-30)  uM    = 1.809E-21(3.436E-27)    nM

Although absolute energy values may exhibit considerable uncertainty, an inherent limitation of continuum solvation models, the relative free energy differences among candidate riboswitches remain highly informative for distinguishing strong and weak binding systems.

The energy decomposition analysis further reveals the atomic details of the binding interaction6. The residue contribution data are extracted from the _pid~res.dat file:

bash
                MM-PBSA(with DH  )

RNA 26G          0.977(    0.118)
RNA 27U          0.843(    0.104)
RNA 28G          0.656(    0.067)
RNA 29C          0.253(   -0.010)
RNA 30C          0.065(   -0.064)
RNA 31U         -0.007(   -0.151)
RNA 32C         -0.125(   -0.306)
RNA 33G         -1.146(   -1.103)
RNA 34C         -2.758(   -3.308)
RNA 35A        -24.646(  -24.547)
RNA 36U        -23.141(  -21.007)

Per-residue energy decomposition provided further insight into the molecular basis of SAM recognition. As the example above shows, residues A35 and U36 contributed disproportionately to the total binding free energy (−24.6 and −23.1 kcal·mol⁻¹, respectively), suggesting that these bases form key stabilizing interactions with the SAM ligand. Such residue-level insights help elucidate the molecular recognition features governing riboswitch specificity.

PCA & FEL Analysis

To characterize the dominant conformational motions underlying ligand binding, principal component analysis (PCA) was performed on the backbone atomic coordinates. The covariance matrix of atomic fluctuations was diagonalized, and the first few principal components (PCs) representing the most significant collective motions were extracted. For the SAM VI-8-15 riboswitch, PC1 and PC2 accounted for 43.9% and 14.9% of the total variance, respectively, with a cumulative contribution of 58.8%, indicating that these two modes effectively describe the major conformational transitions of the system (Figure 16).

Figure 16. PCA scree plot of SAM VI-8-15

Based on PCA projections, a free energy landscape (FEL) was constructed by mapping the conformational probability density along the PC1–PC2 coordinates and converting it into free energy values via the Boltzmann relation. As shown in Figure 17, distinct energy minima correspond to stable conformational basins, while energy barriers represent transition states separating metastable conformations. By analyzing the connectivity among these basins, the conformational transition pathways induced by SAM binding can be visualized, offering dynamic insight into the regulatory mechanisms of the riboswitch.

Figure 17. PCA and FEL analysis of SAM VI-8-15

Results

Based on the computational workflow described above, we systematically performed molecular dynamics (MD) simulations on multiple RNA sequences provided by the wet-lab team. The resulting structural and energetic data were used to guide the design and optimization of riboswitch constructs. Among all candidates, some riboswitches, such as SAM VI-8-15, exhibited favorable performances. We further conducted detailed mechanistic studies on SAM VI-8-15 riboswitch to provide theoretical insights for future riboswitch design and optimization.

Screening of Riboswitches

Screening of Linker Lengths

At the early stage of riboswitch design, the wet-lab team observed that directly replacing the guanine-binding domain in a known riboswitch scaffold with a SAM-binding domain failed to produce the expected switching function. We hypothesized that this was due to improper structural coupling mediated by the linker sequence. To address this issue, we systematically screened various riboswitch types and linker-length combinations. A total of 11 SAM riboswitch variants and 2 literature-validated guanine riboswitches were provided by the wet-lab team. All constructs were evaluated following the established workflow. As shown in Figure 18, the top four candidates ranked by binding energy were SAM VI-4, SAM VI-6, SAM VI-8, and SAM III-2 (Note:Bars are missing for sequences that either did not yield reliable structures or whose binding energies are near zero.). These computational results were provided to the wet-lab team as key references for experimental screening.

Figure 18. Screening of linker lengths

Fine-Tuning of Linker Sequences

Experimental validation confirmed that SAM VI-6 and SAM VI-8 displayed excellent switching performance. To further optimize these candidates, we explored alternative linker base arrangements. In parallel, because the SAM III series also showed promising preliminary results (optimal activity with a 2-nt linker but a sharp decline at 6 nt), we performed a refined screening of linker lengths ranging from 1 to 4 nucleotides and tested multiple base-ordering combinations. Approximately 30 variants were designed and provided by the wet-lab team. We evaluated all candidates through the same computational pipeline. As summarized in Figure 19, the ten riboswitches with the most favorable binding free energies were VI-6-3, VI-6-4, VI-8-7, VI-8-9, VI-8-15, VI-8-16, III-2-2, III-3-1, III-4-2, and III-4-3 (Note:Bars are missing for sequences that either did not yield reliable structures or whose binding energies are near zero.). These results offered clear computational guidance for subsequent experimental optimization.

Figure 19. Screening of riboswitch variants

Mechanism Studies

Mechanism Studies:

Experimental validation ultimately identified SAM VI-8, SAM VI-8-15, and SAM VI-8-16 as the most effective constructs, each exhibiting robust on/off switching behavior in hepatocyte assays. Although mechanistic elucidation was not the primary focus of this study, we selected SAM VI-8-15 as a representative construct for detailed mechanistic investigation5.

We first analyzed the secondary structure of SAM VI-8-15 (Figure 20). A hydrogen bond was observed between nucleotides 38 and 43, while no other base pairs were disrupted or newly formed. This observation suggests that SAM likely interacts with the region surrounding these two nucleotides. Because this site is located relatively far from the linker region, such local interactions may induce long-range conformational rearrangements within the riboswitch, ultimately leading to its functional switching behavior.

Figure 20. SAM VI-8-15 2D structure before (A) and after (B) SAM binding. Aptamer in red, linker in white

We then analyzed the conformational transitions of SAM VI-8-15 before and after ligand binding. Principal component analysis (PCA) and free energy landscape (FEL) mapping (Figure 17) revealed several distinct energy basins corresponding to intermediate conformations. Comparison of the initial, intermediate, and final states demonstrated a pronounced folding rearrangement within the aptamer domain, accompanied by significant dynamic fluctuations in the 3'-terminal region during the simulation (Figure 21B).

Figure 21. Structural changes of SAM VI-8-15 after binding SAM. Aptamer in magenta, linker in gray, remaining nucleotides in cyan, and the altered structure in orange.

To identify residues contributing most to ligand stabilization, we performed MMPBSA free energy decomposition analysis (Figure 22). The results indicated that A35, A36, U42, C43, and C44 contributed most significantly to the overall binding free energy.

Figure 22. Residue-wise binding energy contribution of SAM VI-8-15

Structural visualization in PyMOL further revealed the detailed interaction network (Figure 23):

  • The adenine moiety of SAM forms one hydrogen bond with U44 and another with the 4'-hydroxyl group of C43;
  • The 2'-hydroxyl group of SAM’s ribose interacts with the N1 atom of A35;
  • The terminal carboxyl and amino groups of SAM form a tri-hydrogen-bond network with the phosphate group of U42.

These cooperative interactions jointly stabilize the SAM–RNA binding interface and are likely critical for the switching mechanism of the SAM VI-8-15 riboswitch.

Image 1
Figure 23. Binding mode of SAM with the VI-8-15 riboswitch

Conclusion

By integrating existing computational tools and modeling approaches, we have successfully established an efficient and reliable workflow for high-throughput screening of riboswitches. This workflow enables rapid determination of RNA structural features, accurate estimation of binding affinities with SAM, and detailed structural insights, including free energy landscapes, conformational dynamics, and binding patterns. Importantly, our computational predictions showed strong agreement with experimental results, effectively guiding wet-lab experiments and accelerating the overall research process.

Despite these successes, we recognize several limitations of the current model. First, the absolute binding free energy values often deviate significantly from experimental measurements, indicating that further improvements in computational accuracy are needed. Second, although the workflow has been carefully optimized, the computational cost and time required remain high due to the large size and complexity of riboswitch systems. These limitations reflect inherent challenges in current molecular modeling methods.

Another particular challenge lies in obtaining accurate initial RNA structures. While our multi-tool integration combined with manual validation helps address this issue, the predictive ability remains limited for highly complex or unconventional RNA sequences. Some RNAs may not adopt a clearly defined reasonable conformation, adding extra difficulty to structure prediction and downstream simulations.

At the same time, these challenges present important opportunities for future development. Our team is actively working on next-generation machine learning models aimed at improving the accuracy of RNA structure and interaction predictions. With ongoing advancements in AlphaFold3 and the emergence of RNA-specific prediction tools, we anticipate that the computational workflow will continue to evolve. Ultimately, this approach will provide a faster, more efficient, and user-friendly platform for high-throughput riboswitch screening, offering stronger computational support for synthetic biology and drug design applications.

Deep Learning for RNA-Target Interaction Prediction

To better interpret the binding patterns between RNA and its targets, and to guide the subsequent experimental design, we introduced deep learning approaches in our modeling work. Specifically, we employed open-source predictive models to study RNA–protein interactions and RNA–small molecule interactions. These computational tools provide a cost-efficient and scalable strategy to prioritize candidate sequences before wet-lab validation.

Currently, there is a lack of large-scale datasets that directly support RNA function prediction, and our experimental throughput does not allow us to train such a model from scratch. Therefore, we focused on predicting RNA–target binding affinity, as binding is the necessary prerequisite for any biological function of RNA (e.g., riboswitch activation/inactivation). Binding predictions thus serve as a useful proxy for RNA functional potential.

1. RNA–Protein Interaction Prediction

We developed DeepRPI, a deep learning framework for predicting RNA–protein interactions.

Figure 24. The architect of DeepRPI

  • Input representation: RNA sequences were encoded using RNA-BERT27, which captures contextual and structural information from RNA sequences, while protein sequences were encoded using ESM-228, a powerful protein language model trained at the evolutionary scale.
  • Interaction modeling: The embeddings were passed into a cross-attention module to capture potential binding sites and interaction patterns between RNA and proteins.
  • Output prediction: The cross-attention output was fed into a multilayer perceptron (MLP) classifier to produce a binary prediction: binding or non-binding.
View User Manual of DeepRPI

Evaluations on public databases29 demonstrated competitive predictive performance.

Image 1
Figure 25. Training and Validation Loss Curve of DeepRPI
Image 2
Figure 26. DeepRPI ROC Curve Demonstrating High Classification Performance

In our project, DeepRPI mainly served as an initial scoring system for candidate RNA sequences, helping us narrow down design space before conducting experiments. We have also integrated the DeepRPI model into our RNA design software, enabling more effective identification of RNA–protein interactions within our design pipeline.

2. RNA-Small Molecule Interaction Prediction

In addition to processing protein targets, we decided to apply SMRTnet30, a newly developed deep learning framework designed to predict RNA-small molecule interactions (SRIs) without requiring RNA tertiary structure information, which makes it highly suitable for drug discovery and riboswitch design.

SMRTnet integrates multiple neural components into a multimodal framework:

Figure 27. SMRTNet Framework for Small Molecule–RNA Interaction Prediction

Two large language models (LLMs) are used to encode RNA sequences and small molecules separately, capturing contextual and chemical features. Convolutional neural networks (CNNs) extract local patterns from RNA secondary structures. Graph attention networks (GATs) process molecular graphs of small molecules, learning atom-level dependencies. Finally, an attention-based multimodal fusion module integrates RNA and small molecule representations, enabling precise prediction of binding affinities.

This hybrid architecture allows SMRTnet to achieve state-of-the-art performance across multiple benchmarks, significantly outperforming existing SRI prediction tools.

During our iHP activities, a PhD student recommended SMRTnet to us as a promising tool for RNA–small molecule prediction. Inspired by this suggestion, we employed the model to virtually screen our designed RNA sequences. The model's predictions aligned well with experimental outcomes. Specifically, MAT1a-16 RNA was consistently ranked as the top candidate and subsequently demonstrated the best performance in wet-lab validation, confirming the utility of SMRTnet as a guiding tool for RNA design.

3. Summary

In this work, the RNA–protein interaction predictor (DeepRPI) and the RNA–small molecule interaction predictor (SMRTnet) together form the computational backbone of our RNA design pipeline. Each tool addresses a different aspect of RNA functionality and provides complementary insights for sequence optimization. Ultimately, we placed more emphasis on RNA–small molecule predictions for riboswitch design, but both approaches highlight the potential of deep learning as a valuable tool in RNA engineering.

Modeling of Riboswitch Based Drug Delivery

The ultimate objective of this project is to achieve controllable drug delivery in humans via an RNA riboswitch (hereafter “riboswitch”) encapsulated in lipid-based nanoparticles (LNPs), so as to modulate the concentration of S-adenosylmethionine (SAM) in the liver of hepatitis patients from a pathological level (≈ 40 µM) back to the normal physiological level (≈ 60 µM). Because animal experiments are not permitted, we develop an ordinary differential equation (ODE) model to simulate the human pharmacokinetics (PK) and pharmacodynamics (PD) of this strategy and thereby test its feasibility in silico.

The model comprises three levels:

  1. A physiologically based pharmacokinetic (PBPK) model that simulates the distribution of LNP-riboswitch from intravenous injection to accumulation in peri-hepatocyte regions.
  2. A cellular model that represents LNP transport from the vasculature into hepatocytes, intracellular release of the riboswitch and subsequent SAM expression.
  3. Multi-dosage simulations to evaluate whether repeated dosing can keep intrahepatic SAM concentration stably at the normal level over long periods.

Part 1. Physiologically Based Pharmacokinetic (PBPK) Model

Model Description:

After an intravenous injection, LNP-riboswitch particles circulate and distribute among major organs; the model represents the following compartments: lung (LU), heart (HT), kidney (KI), spleen (SP), small intestine (SI), liver (LI), arterial blood (AB), venous blood (VB) and the rest of body (ROB)31. Each tissue compartment is described by a tissue concentration C_i (units: µmol·L⁻¹, or µM) and exchanges mass with the arterial or venous blood according to perfusion and partitioning.

Figure 28. PBPK schematic diagram

Theoretical Formulation and Model Parameters

Initial Conditions:

At t = 0, the body contains no LNP-riboswitch, and a single intravenous dose of 100 μmol is administered.

Model Equations:

1. Lung
dCLUdt=QCOVLU(CVBCLUKp,LU)
Explanation:

Drug concentration in the lung changes due to the balance between venous blood inflow and lung tissue redistribution. Blood delivers drug into the lung, while tissue releases drug back according to its partition coefficient, then enters the arterial system.

Parameters:
  • CLU:Lung LNP-riboswitch concentration (µM)
  • CVB:Venous blood LNP-riboswitch concentration (µM)
  • QCO:Total cardiac output, taken as 5.1 L·min⁻¹
  • VLU:Lung volume, taken as 0.5 L
  • Kp,LU:Lung tissue-to-blood partition coefficient, assumed to be 0.5
2. Heart
dCHTdt=QHTVHT(CABCHTKp,HT)
Explanation:

Heart drug levels are governed by arterial supply and myocardial uptake. Arterial blood delivers drug, tissue absorbs it per the partition coefficient, and excess blood returns to systemic circulation.

Parameters:
  • CHT:Heart LNP-riboswitch concentration (µM)
  • CAB:Arterial blood LNP-riboswitch concentration (µM)
  • QHT:Heart blood flow, taken as 0.25 L·min⁻¹
  • VHT:Heart volume, taken as 0.33 L
  • Kp,HT:Heart tissue-to-blood partition coefficient, assumed to be 0.5
3. Kidney
dCKIdt=QKIVKI(CABCKIKp,KI)
Explanation:

Kidney drug concentration changes result from arterial input and tissue redistribution. Arterial blood delivers drug, kidney tissue absorbs it according to its partition coefficient, and remaining drug returns via renal veins.

Parameters:
  • CKI:Kidney LNP-riboswitch concentration (µM)
  • CAB:Arterial blood LNP-riboswitch concentration (µM)
  • QKI:Kidney blood flow, taken as 1.1 L·min⁻¹
  • VKI:Kidney volume, taken as 0.31 L
  • Kp,KI:Kidney tissue-to-blood partition coefficient, assumed to be 0.8
4. Spleen
dCSPdt=QSPVSP(CABCSPKp,SP)
Explanation:

Spleen drug levels depend on arterial inflow and tissue uptake. Blood delivers drug, tissue absorbs it via its partition coefficient, and residual blood exits toward the portal vein or systemic circulation.

Parameters:
  • CSP:Spleen LNP-riboswitch concentration (µM)
  • QSP:Spleen blood flow, taken as 0.2 L·min⁻¹
  • VSP:Spleen volume, taken as 0.4 L
  • Kp,SP:Spleen tissue-to-blood partition coefficient, assumed to be 5.0
5. Small Intestine
dCSIdt=QSIVSI(CABCSIKp,SI)
Explanation:

Drug concentration in the small intestine is determined by mesenteric arterial delivery and tissue uptake. Blood supplies drug, intestinal tissue absorbs it per its partition coefficient, then blood flows to the liver via the portal vein.

Parameters:
  • CSI:Small intestine LNP-riboswitch concentration (µM)
  • CAB:Arterial blood LNP-riboswitch concentration (µM)
  • QSI:Small intestine blood flow, taken as 0.85 L·min⁻¹
  • VSI:Small intestine volume, taken as 0.64 L
  • Kp,SI:Small intestine tissue-to-blood partition coefficient, assumed to be 0.5
6. Liver
dCLIdt=1VLI[QLICAB+(1Fs)(QSPCSPKp,SP+QSICSIKp,SI)(QLI+(1Fs)(QSP+QSI)+CLHE)CLIKp,LI]
Explanation:

Liver drug concentration changes are controlled by arterial and portal inflow, tissue uptake, and hepatic clearance. Arterial and portal blood deliver drug; the liver absorbs it based on the partition coefficient, metabolizes a fraction, and the remaining drug returns via hepatic veins.

Parameters:
  • CLI:Liver LNP-riboswitch concentration (µM)
  • QLI:Liver arterial blood flow, assumed to be 0.35 L·min⁻¹ under pathological conditions
  • VLI:Liver volume, assumed to be 1.8 L under pathological conditions
  • Kp,LI:Liver tissue-to-blood partition coefficient, assumed to be 15
  • Fs:Fraction of portal blood bypassing the liver, assumed to be 0.3
  • CLHE:Hepatic clearance, assumed to be 0.6 L·min⁻¹ under pathological conditions
7. Venous Blood
dCVBdt=1VVB[(QHTCHTKp,HT+QKICKIKp,KI+QROBCROBKp,ROB)+Fs(QSPCSPKp,SP+QSICSIKp,SI) +(QLI+(1Fs)(QSP+QSI))CLIKp,LIQCOCVB]
Explanation:

Venous blood drug levels reflect the net input from all tissues minus pulmonary extraction. It collects returning drug from heart, kidney, liver, spleen, intestine, and other tissues before entering the lungs.

Parameters:
  • CROB:Rest of body LNP-riboswitch concentration (µM)
  • VVB:Venous blood volume, taken as 1.8 L
  • QROB:Blood flow to the rest of the body, taken as 2.35 L·min⁻¹
8. Arterial Blood
dCABdt=QCOVAB(CLUKp,LUCAB)
Explanation:

Arterial drug concentration is determined by pulmonary output and distribution to systemic tissues. Blood from the lungs delivers drug into arteries, which then supply all organs.

Parameters:
  • VAB:Arterial blood volume, taken as 1.1 L
9. Rest of Body
dCROBdt=QROBVROB(CABCROBKp,ROB)
Explanation:

Drug concentration in the remaining tissues depends on arterial inflow and tissue redistribution. This compartment represents all organs not explicitly modeled, contributing significantly to overall distribution kinetics.

Parameters:
  • VROB:Volume of the rest of the body, taken as 53.12 L
  • Kp,ROB:Tissue-to-blood partition coefficient for the rest of the body, assumed to be 0.3

Simulation Results:

The above system of ordinary differential equations was numerically solved using MATLAB to simulate the dynamic distribution of the LNP-riboswitch in the human body. The simulation results are as follows:

Figure 29. PBPK Model Simulation Results

After intravenous injection, the LNP-riboswitch initially distributes rapidly into the pulmonary circulation and arterial system. Subsequently, its concentration in the lungs and arterial blood declines quickly, indicating that the drug is being widely distributed to tissues throughout the body. Among all organs, the liver shows a pronounced drug accumulation effect, with its concentration reaching a steady state of approximately 20 µM within about 30 minutes, which is significantly higher than in other tissues. These results confirm that the LNP-riboswitch exhibits effective liver-targeted distribution, providing a key pharmacokinetic foundation for subsequent intracellular drug release and functional expression.

PBPK Model Simulation Code
MATLAB
clear;
clc;
close all;

% time span
tspan = [0 80]; 

% Initial Condition 
C0 = zeros(9,1); 
Dose = 100; %μmol
V_VB = 1.8; %venous blood(L)
C0(1) = Dose / V_VB;  

%solve ODE
[t, C] = ode15s(@odefun_human_hepatitis_final, tspan, C0);

title('PBPK Simulation for INP Injection','FontSize', 14, 'FontWeight', 'bold')
xlabel('Time(min)')
ylabel('Concentration(μM)')
plot(t,C(:,1),'DisplayName','Venous Blood','LineWidth',2);hold on;
plot(t,C(:,2),'DisplayName','Lung','Linewidth',2);
plot(t, C(:,3), 'DisplayName', 'Arterial Blood', 'LineWidth', 2);
plot(t, C(:,4), 'DisplayName', 'Heart', 'LineWidth', 2);
plot(t, C(:,5), 'DisplayName', 'Kidney', 'LineWidth', 2);
plot(t, C(:,6), 'DisplayName', 'Liver', 'LineWidth', 2);
plot(t, C(:,7), 'DisplayName', 'Spleen', 'LineWidth', 2);
plot(t, C(:,8), 'DisplayName', 'Small Intestine', 'LineWidth', 2);
plot(t, C(:,9), 'DisplayName', 'Rest of Body', 'LineWidth', 2);
xlabel('Time(s)');
ylabel('Concentration(μM)');
legend('show')
grid on

% ODE Function
function dCdt = odefun_human_hepatitis_final(t, C)
    Fs = 0.3; 
    CL_HE = 0.6;
    
    % Volumn (L)
    V_VB = 1.8;    V_LU = 0.5;    V_AB = 1.1;
    V_HT = 0.33;   V_KI = 0.31;   V_LI = 1.8;
    V_SP = 0.4;    V_SI = 0.64;   V_ROB = 53.12;
    
    % Blood Flow (L/min)
    Q_HT = 0.25;   Q_KI = 1.1;    Q_LI = 0.35;
    Q_SP = 0.2;    Q_SI = 0.85;   Q_ROB = 2.35;
    Q_CO = 5.1; 

    % Organ-to-blood distribution coefficient (Kp)
    Kp_LU = 0.5;   Kp_HT = 0.5;   Kp_KI = 0.8;
    Kp_LI = 15;    Kp_SP = 5;     Kp_SI = 0.5;
    Kp_ROB = 0.3;
    
    CVB = C(1); CLU = C(2); CAB = C(3); CHT = C(4); CKI = C(5);
    CLI = C(6); CSP = C(7); CSI = C(8); CROB = C(9);

    Amount_out_HT  = Q_HT * (CHT / Kp_HT);
    Amount_out_KI  = Q_KI * (CKI / Kp_KI);
    Amount_out_ROB = Q_ROB * (CROB / Kp_ROB);
    
    Amount_from_portal_organs = Q_SP * (CSP / Kp_SP) + Q_SI * (CSI / Kp_SI);
    Amount_shunted_to_vein = Fs * Amount_from_portal_organs;
    Amount_to_liver_via_portal = (1 - Fs) * Amount_from_portal_organs;
    
    Blood_flow_out_of_liver = Q_LI + (1-Fs)*(Q_SP + Q_SI);
    Amount_out_of_liver = Blood_flow_out_of_liver * (CLI / Kp_LI);

    dCVB = (1/V_VB) * ( Amount_out_HT + Amount_out_KI + Amount_out_ROB + ...
                         Amount_out_of_liver + Amount_shunted_to_vein - Q_CO * CVB );

    dCLU = (Q_CO / V_LU) * (CVB - CLU / Kp_LU);
    dCAB = (Q_CO / V_AB) * (CLU / Kp_LU - CAB);
    
    dCHT = (Q_HT / V_HT) * (CAB - CHT / Kp_HT);
    dCKI = (Q_KI / V_KI) * (CAB - CKI / Kp_KI);
    dCSP = (Q_SP / V_SP) * (CAB - CSP / Kp_SP);
    dCSI = (Q_SI / V_SI) * (CAB - CSI / Kp_SI);
    dCROB = (Q_ROB / V_ROB) * (CAB - CROB / Kp_ROB);
    
    Amount_to_liver_via_artery = Q_LI * CAB;
    Total_amount_into_liver = Amount_to_liver_via_artery + Amount_to_liver_via_portal;
    Amount_cleared_in_liver = CL_HE * (CLI / Kp_LI);
    dCLI = (1/V_LI) * (Total_amount_into_liver - Amount_out_of_liver - Amount_cleared_in_liver);

    dCdt = [dCVB; dCLU; dCAB; dCHT; dCKI; dCLI; dCSP; dCSI; dCROB];
end

Part 2. Cellular Model

Model Description:

After completing the PBPK simulation of the LNP-riboswitch distribution throughout the body, this study further established a model for its transmembrane transport and functional expression within hepatocytes. This process involves multiple consecutive biological steps37,38.

Figure 30. Cellular Model of LNP Transport and Riboswitch Release

The intracellular delivery and functional realization of the LNP-riboswitch begins with its binding to apolipoprotein E (ApoE) in the blood circulation to form a complex. This complex then crosses the vascular endothelium and specifically binds to receptors on the surface of hepatocytes. Subsequently, the complex is internalized via endocytosis, forming early endosomes, during which ApoE dissociates. Once inside the intracellular trafficking pathway, part of the LNP is sorted into the exocytosis pathway and eventually expelled from the cell, while the remaining portion is transported to late endosomes as the endosomes mature. In the late endosome stage, a fraction of the LNP successfully escapes into the cytoplasm, releasing the riboswitch, which then initiates translation of the target gene. Finally, the generated S-adenosylmethionine (SAM) exerts negative feedback inhibition on the riboswitch activity, thereby achieving precise regulation of its own concentration.

Theoretical Formulation and Model Parameters

Initial Conditions:

The initial conditions are similar to those of the PBPK model. It is assumed that no LNP-riboswitch is present in the body prior to administration. At t = 0, a single intravenous bolus dose is given instantaneously, with a dose of 100 µmol.

Model Equations:

1. LNP Binding to ApoE
dLApoEdt=konLliver[ApoE]KM+Lliver(koff+kint)LApoE
Explanation:

This equation describes the dynamic binding of LNP to ApoE protein in the liver microcirculation. The process follows Michaelis-Menten kinetics, while also considering the natural dissociation of the complex and its internalization by hepatocytes. Together, these factors determine the instantaneous concentration of the LNP-ApoE complex in the blood.

Parameters:
  • LApoE:Concentration of the LNP-ApoE complex (µM)
  • Lliver:LNP concentration in blood near the liver (µM)
  • [ApoE]:ApoE concentration in blood near the liver, assumed to be 0.8 µM
  • kon:Association rate constant between ApoE and LNP, assumed to be 20 h⁻¹
  • koff:Dissociation rate constant of the ApoE-LNP complex, assumed to be 0.02 h⁻¹
  • kint:Endocytosis rate of LNP-ApoE complex, assumed to be 0.4 h⁻¹
  • KM:ApoE binding saturation constant, assumed to be 0.05 µM
2. LNP Endocytosis and Endosomal Transport
dLEEdt=kintLApoE(kEEL E+kexo,EE)LEEdLLEdt=kEEL ELEE(kLEL y+kescape+kexo,LE)LLEdLLydt=kLEL yLLEklysisLLy
Explanation:

These equations describe the intracellular transport of LNP after endocytosis. LNP in early endosomes (EE) can either be transported to late endosomes (LE) or be exocytosed. LNP in late endosomes faces three possible fates: transport to lysosomes for degradation, exocytosis, or successful escape into the cytoplasm. Only the fraction that escapes into the cytoplasm can release functional riboswitch.

Parameters:
  • LEE:LNP concentration in early endosomes (µM)
  • LLE:LNP concentration in late endosomes (µM)
  • LLy:LNP concentration in lysosomes (µM)
  • kescape:Rate of LNP escape from LE to cytoplasm, assumed to be 0.3 h⁻¹
  • klysis:Degradation rate of LNP in lysosomes, assumed to be 2.0 h⁻¹
  • kexo,EE:Exocytosis rate from EE, assumed to be 0.05 h⁻¹
  • kexo,LE:Exocytosis rate from LE, assumed to be 0.3 h⁻¹
  • kEEL E:Transport rate from EE to LE, assumed to be 2.0 h⁻¹
  • kLEL y:Transport rate from LE to lysosome, assumed to be 0.05 h⁻¹
3. Exocytosis Pathway
dLexodt=(kexo,EELEE+kexo,LELLE)kclear,exoLexo
Explanation:

This equation quantifies the dynamic changes of LNP expelled from the cell via exocytosis. The process includes contributions from both early and late endosome exocytosis, as well as systemic clearance of the exocytosed LNP.

Parameters:
  • Lexo:Concentration of LNP expelled via exocytosis (µM)
  • kclear,exo:Clearance rate of exocytosed LNP, assumed to be 0.5 h⁻¹
4. Riboswitch Release and Target Expression
dRdt=krel kescape LLEγR Rkinact R SAMdSAMdt=kprod,SAM+αR11+(SAMKSAM)n(γSAM+kutil)SAM
Explanation:

These equations describe the dynamics of the riboswitch in the cytoplasm and its regulation of SAM concentration. The riboswitch concentration is determined by release from LNP, natural degradation, and SAM-mediated inactivation. SAM concentration is controlled by basal production, riboswitch-driven synthesis, and metabolic consumption. Riboswitch expression activity is negatively regulated by SAM via a Hill-type feedback mechanism.

Parameters:
  • R:Concentration of riboswitch in cytoplasm (µM)
  • SAM:SAM concentration (µM)
  • krel:Rate of riboswitch release from LNP, assumed to be 2.2 h⁻¹
  • kinact:Rate of riboswitch inactivation by SAM, assumed to be 0.2 µM⁻¹·h⁻¹
  • kprod,SAM:Basal SAM production rate, calculated as 0.088 µM·h⁻¹
  • kutil:Functional consumption rate of SAM, assumed to be 0.002 µM·h⁻¹
  • KSAM:SAM feedback half-saturation constant, 60 µM
  • γR:Riboswitch natural degradation rate, assumed to be 0.1 h⁻¹
  • γSAM:SAM natural degradation rate, assumed to be 0.004 h⁻¹
  • α:Riboswitch-mediated SAM synthesis rate, assumed to be 2 µM·h⁻¹
  • n:Hill coefficient, assumed to be 4

Simulation Results:

By integrating the above cellular model into the PBPK framework and performing numerical simulation using MATLAB, the following results were obtained:

Figure 31. Results for Cellular Model Simulation

After a single dose, the intracellular concentration of free riboswitch reaches its peak (~1.4 µM) at approximately 8 hours, then gradually declines due to degradation and SAM-mediated feedback inhibition. At the same time, intracellular SAM concentration peaks at around 20 hours after administration, approaching the target therapeutic level of 60 µM. However, it subsequently shows a slow but continuous decline, eventually returning to the pathological level (~40 µM). These results indicate that a single dose can temporarily increase SAM concentration in hepatocytes, but cannot maintain long-term stable therapeutic effects. Therefore, a multiple-dose strategy is required to achieve sustained regulation of SAM concentration.

Cellular Model Simulation Code
matlab
clear;
clc;
close all;

%PBPK Model
pbpk_tspan = [0 500*60]; 
pbpk_C0 = zeros(9,1); 
pbpk_Dose = 100;  %(μmol)
pbpk_V_VB = 1.8; 
pbpk_C0(1) = pbpk_Dose / pbpk_V_VB;
[t_pbpk_min, C_pbpk] = ode15s(@odefun_human_hepatitis_final, pbpk_tspan, pbpk_C0);
t_pbpk_hr = t_pbpk_min / 60;
LNP_in_liver_profile = C_pbpk(:, 6);
LNP_driver_func = griddedInterpolant(t_pbpk_hr, LNP_in_liver_profile, 'pchip');

% Parameters
params = struct();
params.k_on = 20; params.k_off = 0.02; params.KM = 0.05; params.ApoE = 0.8;
params.k_int = 0.4; params.k_EE_LE = 2.0; params.k_exo_EE = 0.05;
params.k_LE_Ly = 0.05; params.k_escape = 0.3; params.k_exo_LE = 0.3;
params.k_lysis = 2.0;
params.k_clear_exo = 0.5;
params.k_rel = 2.2; 
params.gamma_R = 0.1; 
params.k_inact = 0.2; 
params.alpha = 2;
params.gamma_SAM = 0.004; 
params.k_util = 0.002; 
params.K_SAM = 60; 
params.n = 4;
params.SAM_setpoint = 40; % concentration of SAM in hepatitis patients
params.k_prod_SAM = (params.gamma_SAM + params.k_util) * params.SAM_setpoint; % (h^-1 * uM) = uM/h

% Initial Condition
Y0_cell = [0; 0; 0; 0; 0; params.SAM_setpoint; 0]; 
tspan_cell = [0, 500]; 

%Solve ODE
[t_cell, Y_cell] = ode15s(@(t,Y) lnp_cellular_ode_v8(t, Y, params, LNP_driver_func), tspan_cell, Y0_cell);


%Plot
figure('Name', 'Model with SAM Homeostasis', 'Position', [100, 75, 1000, 700]);
labels = {'LNP in Liver', 'LNP-ApoE Complex','LNP in EE','LNP in LE','LNP in Lysosome', 'Riboswitch','SAM', 'Exocytosed LNP'};

subplot(4,2,1);
plot(t_pbpk_hr*60, LNP_in_liver_profile, 'LineWidth', 2);
title(labels{1}); xlabel('Time (min)'); ylabel('Conc. (μM)'); xlim([0,120]);ylim([0,25]); grid on;

subplot(4,2,2);
plot(t_cell, Y_cell(:,1), 'LineWidth', 2);
title(labels{2}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,1)),max(Y_cell(:,1))*1.1]);grid on;

subplot(4,2,3);
plot(t_cell, Y_cell(:,2), 'LineWidth', 2);
title(labels{3}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,2)),max(Y_cell(:,2))*1.1]);grid on;

subplot(4,2,4);
plot(t_cell, Y_cell(:,3), 'LineWidth', 2);
title(labels{4}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,3)),max(Y_cell(:,3))*1.1]);grid on;

subplot(4,2,5);
plot(t_cell, Y_cell(:,4), 'LineWidth', 2);
title(labels{5}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,4)),max(Y_cell(:,4))*1.1]);grid on;

subplot(4,2,6);
plot(t_cell, Y_cell(:,7), 'LineWidth', 2);
title(labels{8}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,7)),max(Y_cell(:,7))*1.1]);grid on;

subplot(4,2,7);
plot(t_cell, Y_cell(:,5), 'LineWidth', 2);
title(labels{6}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,20]);ylim([min(Y_cell(:,5)),max(Y_cell(:,5))*1.1]);grid on;

subplot(4,2,8);
plot(t_cell, Y_cell(:,6), 'LineWidth', 2);
title(labels{7}); xlabel('Time (h)'); ylabel('Conc. (μM)'); xlim([0,500]);ylim([min(Y_cell(:,6))-5,max(Y_cell(:,6))*1.1]);grid on;
hold on; yline(params.SAM_setpoint, 'r--', 'LineWidth', 1.5); legend('SAM Conc.', 'Baseline (40 μM)', 'Location', 'northeast'); hold off;

sgtitle('LNP-SAM Cellular Dynamics ', 'FontSize', 14, 'FontWeight', 'bold');

% PBPK Model
function dCdt = odefun_human_hepatitis_final(~, C)
    Fs=0.3; CL_HE=0.6; V_VB=1.8; V_LU=0.5; V_AB=1.1; V_HT=0.33; V_KI=0.31; V_LI=1.8; V_SP=0.4; V_SI=0.64; V_ROB=53.12;
    Q_HT=0.25; Q_KI=1.1; Q_LI=0.35; Q_SP=0.2; Q_SI=0.85; Q_ROB=2.35; Q_CO=5.1; Kp_LU=0.5; Kp_HT=0.5; Kp_KI=0.8; Kp_LI=15;
    Kp_SP=5; Kp_SI=0.5; Kp_ROB=0.3; CVB=C(1); CLU=C(2); CAB=C(3); CHT=C(4); CKI=C(5); CLI=C(6); CSP=C(7); CSI=C(8); CROB=C(9);
    Amount_out_HT=Q_HT*(CHT/Kp_HT); Amount_out_KI=Q_KI*(CKI/Kp_KI); Amount_out_ROB=Q_ROB*(CROB/Kp_ROB);
    Amount_from_portal_organs=Q_SP*(CSP/Kp_SP)+Q_SI*(CSI/Kp_SI); Amount_shunted_to_vein=Fs*Amount_from_portal_organs;
    Amount_to_liver_via_portal=(1-Fs)*Amount_from_portal_organs; Blood_flow_out_of_liver=Q_LI+(1-Fs)*(Q_SP+Q_SI);
    Amount_out_of_liver=Blood_flow_out_of_liver*(CLI/Kp_LI);
    dCVB=(1/V_VB)*(Amount_out_HT+Amount_out_KI+Amount_out_ROB+Amount_out_of_liver+Amount_shunted_to_vein-Q_CO*CVB);
    dCLU=(Q_CO/V_LU)*(CVB-CLU/Kp_LU); dCAB=(Q_CO/V_AB)*(CLU/Kp_LU-CAB); dCHT=(Q_HT/V_HT)*(CAB-CHT/Kp_HT);
    dCKI=(Q_KI/V_KI)*(CAB-CKI/Kp_KI); dCSP=(Q_SP/V_SP)*(CAB-CSP/Kp_SP); dCSI=(Q_SI/V_SI)*(CAB-CSI/Kp_SI);
    dCROB=(Q_ROB/V_ROB)*(CAB-CROB/Kp_ROB); Amount_to_liver_via_artery=Q_LI*CAB;
    Total_amount_into_liver=Amount_to_liver_via_artery+Amount_to_liver_via_portal;
    Amount_cleared_in_liver=CL_HE*(CLI/Kp_LI);
    dCLI=(1/V_LI)*(Total_amount_into_liver-Amount_out_of_liver-Amount_cleared_in_liver);
    dCdt=[dCVB;dCLU;dCAB;dCHT;dCKI;dCLI;dCSP;dCSI;dCROB];
end

% Cell Model
function dYdt = lnp_cellular_ode_v8(t, Y, p, lnp_driver)
    L_blood_liver = lnp_driver(t);
    L_ApoE = Y(1); L_EE = Y(2); L_LE = Y(3); L_Ly = Y(4);
    R = Y(5); SAM = Y(6); L_exo = Y(7);

    dL_ApoE = (p.k_on * L_blood_liver * p.ApoE) / (p.KM + L_blood_liver) - p.k_off * L_ApoE - p.k_int * L_ApoE;
    dL_EE = p.k_int * L_ApoE - (p.k_EE_LE + p.k_exo_EE) * L_EE;
    dL_LE = p.k_EE_LE * L_EE - (p.k_LE_Ly + p.k_escape + p.k_exo_LE) * L_LE;
    dL_Ly = p.k_LE_Ly * L_LE - p.k_lysis * L_Ly;
    dR = p.k_rel * p.k_escape * L_LE - p.gamma_R * R - p.k_inact * R * SAM;
    dL_exo = (p.k_exo_EE * L_EE + p.k_exo_LE * L_LE) - p.k_clear_exo * L_exo;
    
    hill_inhibition = 1 / (1 + (SAM / p.K_SAM)^p.n);
    dSAM = p.k_prod_SAM + p.alpha * R * hill_inhibition - (p.gamma_SAM + p.k_util) * SAM;

    dYdt = [dL_ApoE; dL_EE; dL_LE; dL_Ly; dR; dSAM; dL_exo];
end

Part 3. Multiple Dosage

Model Description:

Based on the previously established PBPK and cellular models, a single administration of LNP-riboswitch is insufficient to maintain long-term stable SAM concentrations in hepatocytes. To achieve sustained therapeutic effects, a multiple-dosage model was developed to evaluate the dynamic response and stability of SAM concentrations under a periodic dosing treatment.

Initial Conditions:

The initial conditions are consistent with the PBPK model, assuming no LNP-riboswitch is present in the body prior to administration. The first dose is delivered intravenously as an instantaneous injection at t = 0. Subsequent doses are also 100 μmol each, administered at 3.5-day intervals (84 hours). The simulation covers multiple dosing cycles within this schedule.

Simulation Results:

By integrating the PBPK and cellular models and performing numerical simulations in MATLAB for the multiple-dosing regimen, the following results were obtained:

Figure 32. Results for Multi-dosage Simulation

During the first two dosing cycles, the peak concentration of SAM in hepatocytes gradually increased. From the third dose onward, SAM peak concentrations reached a dynamic equilibrium, and the pharmacokinetic fluctuations within each cycle became consistent. Quantitative analysis indicates that this multiple-dosing scheme maintains the average SAM concentration in hepatocytes at approximately 60.3 μM, precisely meeting the predetermined therapeutic threshold.

Multiple Dosing Simulation Code
matlab
clear;
clc;
close all;

%Dosage
dosing_interval = 84;      % Dosing Interval
total_time = 1000;
pbpk_Dose = 100;           % Dosage Administration(μmol)
target_SAM = 60;           % Target SAM Concentration(μM)

%PBPK Model
pbpk_V_VB = 1.8; 
pbpk_tspan_min = [0, total_time*60]; 
dosing_times_min = (0:dosing_interval:total_time)*60; 

%Initial Condition
C0 = zeros(9,1);
t_pbpk_min_all = [];
C_pbpk_all = [];

for i = 1:length(dosing_times_min)
    current_time = dosing_times_min(i);
    
    C0(1) = C0(1) + pbpk_Dose / pbpk_V_VB;

    if i < length(dosing_times_min)
        t_segment = [current_time, dosing_times_min(i+1)];
    else
        t_segment = [current_time, pbpk_tspan_min(2)];
    end
    
    [t_seg, C_seg] = ode15s(@odefun_human_hepatitis_final, t_segment, C0);
 
    if i > 1
        t_seg = t_seg(2:end);
        C_seg = C_seg(2:end, :);
    end
    
    t_pbpk_min_all = [t_pbpk_min_all; t_seg];
    C_pbpk_all = [C_pbpk_all; C_seg];

    C0 = C_seg(end, :)';
end

[t_pbpk_min_all, unique_indices] = unique(t_pbpk_min_all);
C_pbpk_all = C_pbpk_all(unique_indices, :);

t_pbpk_hr = t_pbpk_min_all / 60; 
LNP_in_liver_profile = C_pbpk_all(:, 6);
LNP_driver_func = griddedInterpolant(t_pbpk_hr, LNP_in_liver_profile, 'pchip');

% Parameters
params = struct();
params.k_on = 20; params.k_off = 0.02; params.KM = 0.05; params.ApoE = 0.8;
params.k_int = 0.4; params.k_EE_LE = 2.0; params.k_exo_EE = 0.05;
params.k_LE_Ly = 0.05; params.k_escape = 0.3; params.k_exo_LE = 0.3;
params.k_lysis = 2.0;
params.k_clear_exo = 0.5;
params.k_rel = 2.2; 
params.gamma_R = 0.1; 
params.k_inact = 0.2; 
params.alpha = 2;
params.gamma_SAM = 0.004; 
params.k_util = 0.002; 
params.K_SAM = 60; 
params.n = 4;
params.SAM_setpoint = 40; 
params.k_prod_SAM = (params.gamma_SAM + params.k_util) * params.SAM_setpoint;

% Initial Condition
Y0_cell = [0; 0; 0; 0; 0; params.SAM_setpoint; 0]; 

% Time Span
tspan_cell = [0, total_time]; 

% Solve ODE
[t_cell, Y_cell] = ode15s(@(t,Y) lnp_cellular_ode_v8(t, Y, params, LNP_driver_func), tspan_cell, Y0_cell);

%Plot
figure('Name', 'Multiple Dosing Simulation', 'Position', [100, 75, 1000, 700]);
subplot(4,2,1);
y_values = pbpk_Dose*ones(size(dosing_times_min)); 
stem(dosing_times_min/60, y_values, 'LineWidth', 1.5, 'Marker', 'none');
title('Dosing'); 
xlabel('Time (h)'); 
ylabel('Dose(μmol)'); 
xlim([0, total_time]);
ylim([0,110]);
grid on;

subplot(4,2,2);
plot(t_pbpk_hr, LNP_in_liver_profile, 'LineWidth', 2);
title('LNP in Liver'); 
xlabel('Time (h)'); 
ylabel('Conc. (μM)'); 
xlim([0, total_time]);
ylim([0,max(C_pbpk_all(:, 6))*1.1]);
grid on;

labels = {'LNP-ApoE Complex','LNP in EE','LNP in LE','LNP in Lysosome', 'Riboswitch','SAM', 'LNP Exocytosed'};
for i = 1:7
    subplot(4,2,i+2);
    plot(t_cell, Y_cell(:,i), 'LineWidth', 1.5);
    title(labels{i}); 
    xlabel('Time (h)'); 
    ylabel('Conc. (μM)'); 
    xlim([0, total_time]);

    if max(abs(Y_cell(:,i))) > 1e-9
        ylim_min = min(Y_cell(:,i));
        ylim_max = max(Y_cell(:,i));
        if ylim_min == ylim_max
            ylim_max = ylim_min + 1;
        end
        ylim_range = ylim_max - ylim_min;
        ylim([ylim_min-0.05*ylim_range, ylim_max+0.05*ylim_range]);
    else
        ylim([-0.1, 1]);
    end
    
    % Mark Initial Concentration of SAM and Target Concentration
    if i == 6 
        hold on; 
        yline(target_SAM, 'g--', 'LineWidth', 2, 'DisplayName', 'Target (60μM)'); 
        yline(params.SAM_setpoint, 'r--', 'LineWidth', 1.5, 'DisplayName', 'Baseline (40μM)'); 
        legend('SAM Conc.', 'Target(60 μM)','Initial Conc.');
        hold off;
        
        % Average Concentration
        last_100h_idx = t_cell > (total_time - 100);
        if any(last_100h_idx)
            mean_SAM = mean(Y_cell(last_100h_idx, 6));
            text(0.2*total_time, 0.70*max(ylim), ...
                sprintf('Avg SAM Conc.: %.1fμM\n', mean_SAM), ...
                'FontSize', 8);
        end
    end
    grid on;
end

sgtitle('Multiple Dosing Simulation', 'FontSize', 14, 'FontWeight', 'bold');

% PBPK Model
function dCdt = odefun_human_hepatitis_final(~, C)
    Fs=0.3; CL_HE=0.6; V_VB=1.8; V_LU=0.5; V_AB=1.1; V_HT=0.33; V_KI=0.31; V_LI=1.8; V_SP=0.4; V_SI=0.64; V_ROB=53.12;
    Q_HT=0.25; Q_KI=1.1; Q_LI=0.35; Q_SP=0.2; Q_SI=0.85; Q_ROB=2.35; Q_CO=5.1; Kp_LU=0.5; Kp_HT=0.5; Kp_KI=0.8; Kp_LI=15;
    Kp_SP=5; Kp_SI=0.5; Kp_ROB=0.3; CVB=C(1); CLU=C(2); CAB=C(3); CHT=C(4); CKI=C(5); CLI=C(6); CSP=C(7); CSI=C(8); CROB=C(9);
    Amount_out_HT=Q_HT*(CHT/Kp_HT); Amount_out_KI=Q_KI*(CKI/Kp_KI); Amount_out_ROB=Q_ROB*(CROB/Kp_ROB);
    Amount_from_portal_organs=Q_SP*(CSP/Kp_SP)+Q_SI*(CSI/Kp_SI); Amount_shunted_to_vein=Fs*Amount_from_portal_organs;
    Amount_to_liver_via_portal=(1-Fs)*Amount_from_portal_organs; Blood_flow_out_of_liver=Q_LI+(1-Fs)*(Q_SP+Q_SI);
    Amount_out_of_liver=Blood_flow_out_of_liver*(CLI/Kp_LI);
    dCVB=(1/V_VB)*(Amount_out_HT+Amount_out_KI+Amount_out_ROB+Amount_out_of_liver+Amount_shunted_to_vein-Q_CO*CVB);
    dCLU=(Q_CO/V_LU)*(CVB-CLU/Kp_LU); dCAB=(Q_CO/V_AB)*(CLU/Kp_LU-CAB); dCHT=(Q_HT/V_HT)*(CAB-CHT/Kp_HT);
    dCKI=(Q_KI/V_KI)*(CAB-CKI/Kp_KI); dCSP=(Q_SP/V_SP)*(CAB-CSP/Kp_SP); dCSI=(Q_SI/V_SI)*(CAB-CSI/Kp_SI);
    dCROB=(Q_ROB/V_ROB)*(CAB-CROB/Kp_ROB); Amount_to_liver_via_artery=Q_LI*CAB;
    Total_amount_into_liver=Amount_to_liver_via_artery+Amount_to_liver_via_portal;
    Amount_cleared_in_liver=CL_HE*(CLI/Kp_LI);
    dCLI=(1/V_LI)*(Total_amount_into_liver-Amount_out_of_liver-Amount_cleared_in_liver);
    dCdt=[dCVB;dCLU;dCAB;dCHT;dCKI;dCLI;dCSP;dCSI;dCROB];
end

% Cell Model
function dYdt = lnp_cellular_ode_v8(t, Y, p, lnp_driver)
    L_blood_liver = lnp_driver(t);
    L_ApoE = Y(1); L_EE = Y(2); L_LE = Y(3); L_Ly = Y(4);
    R = Y(5); SAM = Y(6); L_exo = Y(7);

    dL_ApoE = (p.k_on * L_blood_liver * p.ApoE) / (p.KM + L_blood_liver) - p.k_off * L_ApoE - p.k_int * L_ApoE;
    dL_EE = p.k_int * L_ApoE - (p.k_EE_LE + p.k_exo_EE) * L_EE;
    dL_LE = p.k_EE_LE * L_EE - (p.k_LE_Ly + p.k_escape + p.k_exo_LE) * L_LE;
    dL_Ly = p.k_LE_Ly * L_LE - p.k_lysis * L_Ly;
    dR = p.k_rel * p.k_escape * L_LE - p.gamma_R * R - p.k_inact * R * SAM;
    dL_exo = (p.k_exo_EE * L_EE + p.k_exo_LE * L_LE) - p.k_clear_exo * L_exo;
    
    hill_inhibition = 1 / (1 + (SAM / p.K_SAM)^p.n);
    dSAM = p.k_prod_SAM + p.alpha * R * hill_inhibition - (p.gamma_SAM + p.k_util) * SAM;

    dYdt = [dL_ApoE; dL_EE; dL_LE; dL_Ly; dR; dSAM; dL_exo];
end

Conclusion

Using the established ODE-based model, numerical simulations demonstrate that the LNP-delivered riboswitch system, when combined with a well-designed multiple-dosing strategy, can effectively maintain SAM concentrations in human hepatocytes at normal physiological levels (around 60 μM). This provides a theoretical and simulation-based foundation for achieving long-term stable treatment of hepatitis.

Next Step—Rational Design of RNA

Background and Motivation

Inspired by the recent widespread advances in protein design, we aim to establish a novel functional nucleic acid design paradigm for the de novo design of aptamers or riboswitches.

The protein design approach that is currently popular and well-validated involves dividing the process into two main stages: scaffold design and sequence design. These stages employ distinct models.

Figure 33. Mainstream Protein Design Workflow

Given the functional similarity between nucleic acid aptamers and protein binders, our aim is to develop software tools for nucleic acid aptamer design based on this workflow. This will make a valuable contribution to the field of novel, rational nucleic acid design. The recent breakthroughs achieved by the Baker lab in the field of RNA design demonstrate the considerable feasibility and promising prospects of deep learning-based de novo RNA design.42

Figure 34. Recent work from the Baker Lab has revealed the tremendous potential of deep learning-based functional nucleic acid design.

Direction and Design

We began by developing a sequence design model for nucleic acids, leading to the creation of RNAMPNN—a deep learning-based sequence design tool.

RNAMPNN, as a sequence design model, represents the adaptation of ProteinMPNN43—a commonly used sequence design model in the field of protein design—to RNA design.

Figure 35. Model Architecture of ProteinMPNN

Although RNAMPNN shares similar application scenarios and tasks with ProteinMPNN, the RNA systems it targets exhibit distinct differences from protein systems, presenting certain challenges for our task, such as:

  1. Data scarcity: The main challenge in designing models for RNA systems, as opposed to protein systems, is the limited amount of available data. Currently, publicly available RNA structural data constitutes less than one percent of protein structural data.

  2. Structural instability: Compared to proteins, RNA structures exhibit high flexibility, with some RNAs potentially lacking thermodynamically stable conformations. This poses significant challenges to the robustness of sequence-design models.

  3. Main chain complexity: RNA backbone structures often require a greater number of atoms to be defined than proteins do. This increases both the amount of topological atomic information that the model must interpret and the computational efficiency required.

RNAMPNN broadly follows the fundamental architecture of ProteinMPNN, employing a graph neural network based on message passing as its core model. It incorporates novel mechanisms tailored to the unique properties of RNA, such as simplifying the definition of the RNA backbone for existing sequence design models44 by reducing it to seven atoms, thereby alleviating the computational burden. A distinctive pre-post feature fusion mechanism enables the model to extract additional residue-sequential information beyond purely Euclidean spatial data.

Figure 36. Model Architecture of RNAMPNNN

RNAMPNN primarily consists of four components:

  1. Feature Extraction Module: Used to extract nucleotide-level geometric information from raw atomic coordinates, such as bond lengths, bond angles, and dihedral angles between atoms.

  2. MPNN module: Used for message passing of geometric information at the nucleotide level and feature assignment to nucleotide nodes.

  3. Pre/Post-Fusion Module: Used to fuse node features before and after the MPNN module based on the self-attention mechanism, enabling the model to capture information from nucleotide sequence structures.

  4. Output Head Module: Used to convert extracted node information into discrete nucleotide types.

Results and Optimization

The RNAMPNN model was trained on over 2,000 publicly available RNA structural data points collected by our team and tested on a dataset from the same source that we segmented independently.

Figure 37. Training Curve of RNAMPNN

During model testing, we observed that RNAMPNN performs better with RNA structures containing a greater number of nucleotides, whereas its performance is suboptimal with fewer nucleotides. This is due to RNAMPNN's primary reliance on message passing to extract information from the atomic coordinate space of the RNA backbone. In systems with fewer nucleotides, the message passing mechanism becomes less effective due to the limited number of nodes.

Figure 38. Performance of RNAMPNN on Data with Different Sequence Lengths

Additionally, we found that RNAMPNN is not very robust. Fine-tuning a pre-trained RNAMPNN model on isolated short sequence datasets results in a significant decline in the model's overall performance.

We implemented a series of improvements to the original RNAMPNN architecture and training method, some of which proved effective and were incorporated into the latest release, for example:

  1. We initially attempted to blend atomic geometric features using a standalone atomic-scale MPNN before applying the nucleotide-scale MPNN. However, this approach ultimately proved ineffective and significantly increased computational time and memory consumption, leading us to abandon this design.

Figure 39. Atom-Scale MPNN

  1. We then tried using an XGBoost45 model instead of the traditional MLP as the output head. Specifically, we first performed globally differentiable pre-training with an MLP as the output head. Next, we used the main body of the RNAMPNN as a representation model to assign features to nucleotide nodes. Finally, we used a separately trained XGBoost model to predict node types based on these features. This enhancement was effective because the additional computational cost was concentrated during the training phase, delivering significant performance gains for the model. This design was ultimately integrated into the model's release version.
Image 1
Figure 40.Training of RNAMPNN with XGBoost Output Head
Image 2
Figure 41. XGBoost output head can improve model performance.
  1. We attempted to introduce Gaussian noise during training to enhance model stability, but this approach did not yield the expected results. We believe this may stem from Gaussian noise being an inappropriate choice for the actual physical RNA system. A further avenue to explore is employing molecular dynamics techniques to incorporate multi-conformation data of RNA into the dataset.

Figure 42. Gaussian Noise Data Augmentation

Conclusions and Plans

In this section, we attempted to transfer existing protein design workflows to the field of RNA design, successfully developing the RNAMPNN sequence design model in the process. We hope that this model will provide valuable insights for developing subsequent models.

Due to the time constraints of the iGEM project timeline, we were unable to complete the required RNA scaffold condition generation model — a deep learning model for designing RNA scaffolds based on targets. Currently, there is no publicly available model for performing scaffold generation tasks for small-molecule nucleic acid aptamers. This makes it difficult to validate the model experimentally.

To make it easier for other researchers to use and validate our model, we have integrated it into our user-friendly RNA comprehensive analysis platform, RNA-Factory. Users only need to provide a PDB file via the GUI interface to obtain model outputs.

As the RNAMPNN model is not perfect, we are planning to implement targeted improvements.

  1. Unlike ProteinMPNN, RNAMPNN43 cannot account for multi-stranded RNA, nor can it consider ligand atoms explicitly, as LigandMPNN46 can. These limitations could be overcome by adapting models from protein design research.

  2. The scarcity of data remains a significant challenge. We plan to augment our dataset by incorporating data generated through a combination of nucleic acid structure prediction models and molecular dynamics simulations.

  3. Cost calculation is also a noteworthy concern. We have found that RNAMPNN currently uses a lot of GPU memory and takes a long time to perform inference, which is a disadvantage for designing large-scale RNA sequences. This issue could be resolved by simplifying the model architecture.

At the time of writing this Wiki, the RNAMPNN model is still being optimised and updated, and the dataset for its accompanying skeleton generation model is being collected and constructed.We sincerely hope that, in the future, deep learning models will play an increasingly significant role in the rational and de novo design of RNA. We also anticipate that the patterns we have identified will be the subject of broader research and validation.

References

[1]
Xiao W, Liu G, Chen T, Zhang Y, Lu C. Bifidobacterium bifidum SAM-VI Riboswitch Conformation Change Requires Peripheral Helix Formation. Biomolecules. 2024; 14(7):742.
[2]
Hu, G., Zhou, HX. Magnesium ions mediate ligand binding and conformational transition of the SAM/SAH riboswitch. Commun Biol 6, 791 (2023).
[3]
Wang R, Schlick T. How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors. PLoS Comput Biol. 2025;21(7):e1013230. Published 2025 Jul 15.
[4]
Sanbonmatsu KY. Dynamics of riboswitches: Molecular simulations. Biochim Biophys Acta. 2014;1839(10):1046-1050.
[5]
Chen J, Zeng Q, Wang W, Sun H, Hu G. Decoding the Identification Mechanism of an SAM-III Riboswitch on Ligands through Multiple Independent Gaussian-Accelerated Molecular Dynamics Simulations. J Chem Inf Model. 2022;62(23):6118-6132.
[6]
Hu G, Zhou HX. Binding free energy decomposition and multiple unbinding paths of buried ligands in a PreQ1 riboswitch. PLoS Comput Biol. 2021;17(11):e1009603. Published 2021 Nov 12.
[7]
Wang C, Greene D, Xiao L, Qi R, Luo R. Recent Developments and Applications of the MMPBSA Method. Front Mol Biosci. 2018;4:87. Published 2018 Jan 10.
[8]
Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
[9]
Wang, W., Feng, C., Han, R. et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat Commun 14, 7266 (2023).
[10]
Du, Z., Su, H., Wang, W. et al. The trRosetta server for fast and accurate protein structure prediction. Nat Protoc 16, 5634–5651 (2021).
[11]
Su H, Wang W, Du Z, et al. Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates. Adv Sci (Weinh). 2021;8(24):e2102592.
[12]
J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov, & D. Baker, Improved protein structure prediction using predicted interresidue orientations, Proc. Natl. Acad. Sci. U.S.A. 117 (3) 1496-1503, (2020)
[13]
Zhu, H., Tang, F., Quan, Q. et al. Deep generalizable prediction of RNA secondary structure via base pair motif energy. Nat Commun 16, 5856 (2025).
[14]
[14]. Sarzynska J, Popenda M, Antczak M, Szachniuk M. RNA tertiary structure prediction using RNAComposer in CASP15. Proteins. 2023;91(12):1790-1799.
[15]
Popenda M, Szachniuk M, Antczak M, et al. Automated 3D structure composition for large RNAs. Nucleic Acids Res. 2012;40(14):e112.
[16]
Jurrus E, Engel D, Star K, et al. Improvements to the APBS biomolecular solvation software suite. Protein Sci. 2018;27(1):112-128.
[17]
Lu, Tian & Chen, Feiwu. (2012). Multiwfn: A multifunctional wavefunction analyzer. Journal of computational chemistry. 33. 580-92.
[18]
Lu T. A comprehensive electron wavefunction analysis toolbox for chemists, Multiwfn. J Chem Phys. 2024;161(8):082503.
[19]
Tian Lu, Sobtop, Version 1.0
[20]
GROMACS development team. GROMACS 2024.5. Zenodo.
[21]
Goodsell DS, Olson AJ. Automated docking of substrates to proteins by simulated annealing. Proteins. 1990;8(3):195-202.
[22]
Eberhardt J, Santos-Martins D, Tillack AF, Forli S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J Chem Inf Model. 2021;61(8):3891-3898.
[23]
Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455-461.
[24]
gmx_MMPBSA developers. gmx_MMPBSA. Zenodo.
[25]
Charles Hahn. DuIvy. GitHub repository
[26]
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406-3415.
[27]
Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
[28]
Akiyama M, Sakakibara Y. Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012.
[29]
Yuan J, Wu W, Xie C, Zhao G, Zhao Y, Chen R. NPInter v2.0: an updated database of ncRNA interactions. Nucleic Acids Res. 2014 Jan;42(Database issue):D104-8.
[30]
Fei, Y. et al. Predicting small molecule and RNA target interactions using deep neural networks, 2025.
[31]
Kuepfer L, Niederalt C, Wendl T, et al., 2016. Applied concepts in PBPK modeling: How to build a PBPK/PD model. CPT Pharmacometrics Syst Pharmacol, 5(10), 516–531.
[32]
Davies B, Morris T, 1993. Physiological parameters in laboratory animals and humans. Pharm Res, 10, 1093–1095.
[33]
ICRP, 2002. Basic anatomical and physiological data for use in radiological protection: Reference values. ICRP Publication 89, Ann. ICRP, 32(3–4).
[34]
Sven Björkman, 2002. Prediction of the volume of distribution of a drug: Which tissue–plasma partition coefficients are needed? J Pharm Pharmacol, 54(9), 1237–1245.
[35]
Wang X, Wang WX, 2023. Cellular journey of nanomaterials: Theories, trafficking, and kinetics. Aggregate.
[36]
Paunovska K, Loughrey D, Dahlman JE, 2022. Drug delivery systems for RNA therapeutics. Nat Rev Genet, 23(5), 265–280.
[37]
Zhang X, Goel V, Robbie GJ, 2020. Pharmacokinetics of Patisiran, the first approved RNA interference therapy in patients with hereditary transthyretin-mediated amyloidosis. J Clin Pharmacol, 60(5), 573–585.
[38]
Chatterjee S, Kon E, Sharma P, Peer D, 2024. Endosomal escape: A bottleneck for LNP-mediated therapeutics. Proc Natl Acad Sci USA, 121(11), e2307800120.
[39]
Wang X, Li H, Chen C, Liang Z, 2025. Understanding of endo/lysosomal escape of nanomaterials in biomedical application. Smart Mol, e20240017.
[40]
Sun A, Gasser C, Li F, et al., 2019. SAM-VI riboswitch structure and signature for ligand discrimination. Nat Commun, 10, 5728.
[41]
Xue Y, Li J, Chen D, et al., 2023. Observation of structural switch in nascent SAM-VI riboswitch during transcription at single-nucleotide and single-molecule resolution. Nat Commun, 14, 2320.
[42]
Favor, A., et al., De novo design of RNA and nucleoprotein complexes. bioRxiv, 2025: p. 2025.10.01.679929.
[43]
Dauparas, J., et al., Robust deep learning–based protein sequence design using ProteinMPNN. Science, 2022. 378(6615): p. 49-56.
[44]
Huang, H., et al., RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models. Bioinformatics, 2024. 40: p. i347 - i356.
[45]
Chen, T. and C. Guestrin, XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
[46]
Dauparas, J., et al., Atomic context-conditioned protein sequence design using LigandMPNN. Nature Methods, 2025. 22(4): p. 717-723.