Molecular Modeling | NAU-CHINA

Figure | Molecular Modeling Framework Diagram

Figure 1 | Molecular Modeling Framework Diagram

Molecular modeling is a key tool for understanding the relationship between the structure and function of biological macromolecules. We focused on two core issues:

The conformational changes of the FourU temperature switch were analyzed by using RNA secondary structure prediction and thermodynamic energy calculation.
The structural stability and functionality of the BslA-dCBM fusion protein were evaluated by AlphaFold structure prediction, Kabsch structure comparison and GROMACS.

Qualitative Analysis

The FourU thermometer is an RNA switch with a conserved UUUU sequence that controls SD sequence accessibility via temperature-dependent conformational changes, enabling translation regulation:

At low temperatures: UUUU forms stable base pairs with the SD sequence, occluding the ribosome binding site and inhibiting translation.
At high temperatures: The structure dissociates, exposing the SD sequence and enabling translation initiation.

Literature indicates that the base pairs of the FourU sequence break as the temperature rises. We refer to the literature and successively take the temperature nodes of 310K, 430K and 520K for verification.

RNA Structure is an RNA secondary structure prediction software developed by Mathews Laboratory. Based on Turner thermodynamic parameters, it predicts the most stable RNA secondary structure through the free energy minimization algorithm.

Using RNA Structure, we simulated the structure at each of the three temperatures, yielding the following secondary structure predictions:

Figure | Predicted Secondary Structures from RNA Structure

Figure 2 | Predicted Secondary Structures from RNA Structure

Analysis of these results reveals the following temperature-dependent structural transitions:

Table 1 | Analysis of Thermal Response Process

Temperature	Structural Characteristics	Observations
310 K	Stable base pairing between AGGAGG and UUUU	Complete stem-loop structure
430 K	Partial base pair disruption	Partial exposure of SD sequence
520 K	Complete dissociation between AGGAGG and UUUU	Full exposure of SD sequence, translation initiation

Based on qualitative analysis, the unchain process is temperature-dependent and there is a clear sequence for the base pair fractures. The literature indicates that this is because the simulated temperature is approximately 100°C higher than the experimental value, and the result is only for qualitative reference. Therefore, we developed a quantitative analysis method to try to obtain an RNA simulation process with accurate temperature values.

Quantitative Analysis

We have developed a comprehensive model to optimize the FourU temperature-sensitive ribose switch. This model integrates thermodynamic energy calculation and RNA secondary structure prediction. The core components include: (1) unchaining temperature predictor, and (2) translation potential calculator.

Model Assumptions

To simplify the model and enhance computability, we introduced the following assumptions:

RNA folding occurs under thermodynamic equilibrium.
The ionic environment can be uniformly described using $Na^+$ concentration.
Cooperative dissociation of the UUUU sequence can be modeled using a Hill equation.
All energy calculations are based on the Turner 99 parameters, with extrapolation to other temperatures via entropy correction.

Table 2 | Symbol explanation

Symbol	Explanation
$R$	Boltzmann constant
$T_m$	Design melting temperature
$f=0.88$	Geometric factor reflecting linear strain in the 4U arrangement
$i=4$	Number of base pairs formed between 4U and the SD sequence
$n=0.32$	Hill coefficient
$N_{GC}$	Number of G and C bases in the stem-loop core
$N_{AU}$	Number of A and U bases in the stem-loop core
$[Na^+]$	Sodium ion concentration (mol/L)
$k=16.6$	Ion concentration correction factor

Thermodynamic Equation

We developed a computational model to simulate the translation initiation potential by quantifying the interaction energies specific to the FourU sequence. The translation initiation potential $\sigma$ is defined as: $$\sigma=\exp (-\frac{1}{RT}(E_{total}))$$ Where $R=1.987\times 10^{-3} \text {kcal}·\text {mol}^{-1}·\text K^{-1}$ is the Boltzmann constant, $T$ is the absolute temperature, and $E_{total}$ is the total energy barrier. $$E_{total}=E_{SD}+E_{tRNA}+E_{stem}+E_{4U{binding}}+E_{coop}$$ Among them, $E_{SD}$ and $E_{tRNA}$ are respectively the hybridization energy between SD and antiSD, and the hybridization energy of the start codon and its corresponding anticodon. Since neither the SD sequence nor the start codon has changed, the values are constants. Therefore, the changes are entirely determined by the last three items.

Key Energy Terms for FourU

1. Stem-Loop Stability $E_{stem}$

Where, $E_{stem}$ is the core region of the stem ring, including the 20nt upstream of SD and the SD sequence.

RNAfold (ViennaRNA) computes free energy $\Delta G$ at specific temperatures. First, it predicts the minimum free energy structure using Turner thermodynamic rules via dynamic programming. Second, temperature compensation is applied, correcting $\Delta G$ for entropy changes when deviating from the default 310 K: $$E_{stem}(T)=\Delta G_{folding}(\text{seq}_{stem},T)$$We used RNAfold from the ViennaRNA package to compute the free energy($\Delta G$) at specific temperatures. This process involves two key steps: Firstly, RNAfold, based on Turner's thermodynamic rules, predicts the minimum free energy structure and its value $\Delta G$ at a given temperature through a dynamic programming algorithm; Secondly, temperature compensation correction is carried out. Since the Turner parameter is measured by default at 310 $K$, when calculating other temperatures, the software automatically applies the entropy change ($\Delta S$) correction formula: $$\Delta G_{\text{corrected}}=\Delta G_{\text{original}}+\Delta S(T_\text{targeted}-310)$$

2. 4U Binding Energy $E_{4U{binding}}$

This term models the cooperative dissociation of 4U: $$E_{4U{binding}}=f\times[\sum_i\sigma~G_{bp}(i)]$$ Here, $\sigma~G_{bp}(i)$ is the base pair binding energy between FourU and the SD sequence, calculated via the Nearest-Neighbor model for stacking interactions. $f$ is a geometric factor (0.8–1.2) reflecting 4U alignment: 0.8–0.9 ideal, 1.0 mild bend, 1.1–1.2 strong distortion.

3. Cooperative Melting Energy

This term models the cooperative dissociation of 4U: $$E_{coop}(T)=-RT·\ln(1+(\frac{T}{T_m})^n)$$ Where, $T_m$ is the design melting temperature and $n$ is the Hill coefficient.

Simulation and Results Analysis

An unlinking temperature predictor based on GC content is used to improve accuracy. It accounts for G-C vs. A-U stability, cation shielding (Debye–Hückel effect), and reduced stability of the UUUU quadruplex. A FourU-specific correction CCC is added, yielding the empirical formula: $$T_m(^oC)=(3N_{GC}+2N_{AU})+k\lg([Na^+])+C$$ Where, $H_{GC}$ is the number of G and C bases in the stem-loop core,$H_{AU}$ is the number of A and U bases, $[Na^+]$ is the sodium ion concentration (mol/L), $k$ and is the ion concentration correction factor.

Using Python, we obtained the following results:

1. Stem-loop core identification

Successful identification of the core area of the stem ring: AACUUUUGAAUAGUGAUUCAGGAGG
SD Upstream 20nt: ACUUUUGAAUAGUGAUUC SD sequence: GAGG

2. Temperature response curve

Figure 3 | Temperature Response Curve

By observing the temperature response curve (Figure 1), the predicted value of the unchaining temperature (38.0°C) is close to the target value (37.0°C), and the deviation may result from the simplification of parameters such as the ionic environment. The Hill coefficient is 3.20, indicating that the unchain process has a high degree of positive synergy, which is consistent with the cooperative melting theory of continuous U-base pairing. The GC content in the stem loop region (34.6%) is within the common range of RNA hairpins, balancing stability and temperature sensitivity.

In conclusion, through qualitative and quantitative computer simulations, we have obtained the predicted unchain temperature of 38.0°C, providing a reference for the experimental group.

Prediction Results

Protein tertiary structure prediction is key for functional insights. For the BslA-dCBM fusion, linker regions reduced full-protein accuracy, so BslA and CBM domains were predicted independently to assess structural features and function.

BslA structure prediction yielded ipTM = pTM = 0.69, indicating high confidence. BslA adopts a β-barrel fold, supporting hydrophobic interactions consistent with its biological role. Predicted structure is shown below:

Figure | BslA Predicted Result by AlphaFold

Figure 4 | BslA Predicted Result by AlphaFold

CBM structure prediction yielded ipTM = pTM = 0.7, indicating high confidence. CBM adopts a β-sandwich fold, typical of carbohydrate-binding modules, enabling cellulose recognition and adhesive function. Predicted structure is shown below:

Figure | CBM Predicted Result by AlphaFold

Figure 5 | CBM Predicted Result by AlphaFold

BslA and CBM retain hydrophobic and cellulose-binding functions, respectively. A properly designed linker likely does not alter core domain structure, so BslA-dCBM is expected to maintain both properties, supporting applications in surface modification or biofilm formation.

Fusion Protein Issues

However, AlphaFold prediction of the fusion protein shows low overall confidence, as illustrated below:

Figure 6 | Predicted structure of Fusion Protein

In fusion protein predictions, confidence drops mainly in the flexible linker region, while core functional domains retain high confidence, indicating their structures remain largely unaffected.

The PAE heatmap in AlphaFold quantifies uncertainty in residue–residue positions: lower values indicate reliable predictions, higher values indicate uncertainty, and it helps assess stability in multi-domain or multi-chain complexes.

The PAE heatmap shows a diagonal-matrix pattern, indicating independent spatial conformations of fusion components. This suggests the fusion strategy is structurally viable and the predictions are informative.

Figure 7 | Diagonal Matrix Comparison

Kabsch Comparison

Predicting fusion protein structures is challenging, as functional domains must preserve their tertiary structures and functions after fusion. To test this, we applied the Kabsch algorithm, which finds the optimal rigid rotation between two 3D point sets (e.g., atomic positions of identical residues), to compare each fused domain with its original structure.

Process

Given two point sets $\{\mathbf{p}_i\}, \{\mathbf{q}_i\}$, first centralize: $$ \mathbf{p}'_i=\mathbf{p}_i-\bar{\mathbf{p}}, \quad \mathbf{q}'_i=\mathbf{q}_i-\bar{\mathbf{q}}, \quad \bar{\mathbf{p}}=\tfrac{1}{N}\sum_{i=1}^N \mathbf{p}_i,\; \bar{\mathbf{q}}=\tfrac{1}{N}\sum_{i=1}^N \mathbf{q}_i $$

Construct covariance $H=\sum \mathbf{p}'_i (\mathbf{q}'_i)^{\mathsf{T}}$, apply SVD $H=USV^{\mathsf{T}}$, and obtain the optimal rotation: $$R = VU^{\mathsf{T}}$$ The translation vector is $$\mathbf{t} = \bar{\mathbf{q}} - R\bar{\mathbf{p}}$$

Output: Optimal $(R,\mathbf{t})$ mapping the first point set to the second in the least-squares sense.

The Kabsch algorithm evaluates structural similarity by minimizing RMSD, quantifying conformational differences between domains before and after fusion. This helps assess potential functional changes and supports the structural rationale of fusion design.

The comparison results are:

Figure 8 | Kabsch Comparison Results

Table 3 | RMSD Result by Kabsch

Name	Matching Atomic	RMSD
bslA	1227	4.264 Å
cbm_1	275	1.060 Å
cbm_2	260	0.716 Å

CBM domains (cbm_1, cbm_2) show minimal displacement (RMSD < 1.1 Å), while BslA deviates more (4.264 Å) but remains acceptable given linker flexibility. Overall, the fusion structure is reasonable with domains likely retaining function.

GROMACS Simulation Results

Molecular dynamics simulations of BslA-dCBM (3 ns, GROMACS) show stable BslA and CBM conformations, with β-structures maintained and no unfolding or disorder observed.

Figure 9 | GROMACS Simulation Results and Stability analysis

Time-series RMSD analysis of the protein backbone shows values ≤1.5 nm with small fluctuations, indicating the fusion protein maintains good conformational stability throughout the simulation.

Trend Testing

To specifically verify whether RMSD converges and protein structure is stable, we used Cox Stuart trend test to analyze RMSD data from 1-3 ns to evaluate statistically significant trends.

The Cox–Stuart trend test, a non-parametric method, evaluates upward or downward trends in data, providing a quantitative basis for assessing structural stability.

Let's first make an assumption:

$H_0$：No growth trend
$H_1$：Having Growth trend $$c=\begin{cases}\frac n2,~~~~n\text{ is an even number}\\\frac{n+1}2,~~n\text{ is an odd number}\end{cases}$$ Take $x_i$ and $x_{i+c}$ to form a pair $(x_i,x_{i+c})$, and use the difference $D_i=x_i-x_{i+c}$ between the two elements of each pair to measure the increase or decrease. Let $S^+$ be the number of positive $D_i$ and $S^-$ be the number of negative $D_i$ . When one of the two is large, it is considered that there is a trend in the data. Under the zero assumption without a trend, $D_i$ should follow a binomial distribution $Bin(n',0.5)$, where $n'$ is the total significant logarithm.

To test whether RMSD shows a growth trend, we took , with a $p$ value of $P(Bin(n',0.5)\le k)$. Through calculation, we found that the $p$ value within the 1.3-3nm period was 0.22913 > 0.05, which cannot reject the null hypothesis. This indicates that RMSD shows no growth trend after 1.3nm, and the structure of the fusion protein is relatively stable.

Overall, BslA-dCBM maintained stable conformation throughout the simulation, with secondary structures and functional domains preserving their spatial arrangements and fluctuations within normal physiological range, confirming good structural stability.

Molecular Dynamics Simulation

To determine the optimal quantity of CBM protein in the fusion protein to verify the experimental design, we conducted an analysis based on the number of binding sites and protein hydrophobicity, and used GROMACS for molecular dynamics simulation to evaluate stability.

Number of Binding Sites

CBM binding to cellulose relies on internal aromatic residues forming $\pi$–$\pi$ interactions, which are used to evaluate binding sites. CBM's tertiary structure is well predicted by AlphaFold. Since cellulose is not a standard PDB object, a 2D planar model of cellulose chains was manually constructed in Python to study chain arrangement on surfaces.

Figure | A:The Constructed Cellulose Plane B:Aromatic Residues Located on the CBM Protein

Figure 10 | Cellulose Plane Structure and Important Residues in CBM Protein.
A: The Constructed Cellulose Plane; B: Aromatic Residues Located on the CBM Protein

We used HDOCK to dock fusion proteins with different quantities of CBM proteins with cellulose planes and calculated the number of binding sites. A higher number of binding sites indicates a closer binding to cellulose.

Proportion of Hydrophobic Area

FreeSASA was used to compute the solvent-accessible hydrophobic area of fusion proteins. Increasing CBM number raises overall hydrophobic exposure, potentially enhancing CBM–cellulose interactions, though the effect plateaus with higher CBM counts.

Table 4 | Correlation Results of Hydrophobic Area Output by FreeSASA Software

Structure Name	Atomic Number	Total Area	Hydrophobic Area	Polar Area	Hydrophobic ratio
BslA_1cbm_cellulose	1396	12797.21	7161.53	5635.68	55.97%
BslA_2cbm_cellulose	1714	15787.78	9186.11	6601.67	58.18%
BslA_3cbm_cellulose	2113	20090.34	11694.81	8395.52	58.22%
BslA_4cbm_cellulose	2478	23975.80	14087.19	9888.61	58.76%

GROMACS Molecular Dynamics Simulation

To assess the stability of the structure, we conducted molecular dynamics simulations on four research subjects using GROMACS. During the trajectory analysis stage, the RMSD values were calculated with proteins as the analysis objects. The smaller the RMSD value, the more stable the structure is. The results show that the RMSD of the four research subjects gradually stabilizes with the increase of simulation time.

Based on the above three evaluation indicators, the following are the final number of binding sites, the proportion of hydrophobic area and the RMSD data of the molecular dynamics simulation process:

Table 5 | Summary Data of the Four Objects

Quantity of CBM	Atomic number	Number of binding sites	Proportion of hydrophobic area
1	1396	37	55.97%
2	1714	47	58.18%
3	2113	50	58.22%
4	2478	23	58.76%

Figure | RMSD Values of Four Objects During 3ns Molecular Dynamics Simulation

Figure 11 | RMSD Values of Four Objects During 3ns Molecular Dynamics Simulation

Fusion proteins with 2 or 3 CBMs perform better theoretically. However, structural analysis in PyMOL shows three CBMs span both sides of the cellulose plane, conflicting with the project goal of coating one side, making this design unfeasible. Structure diagrams are shown below:

Figure | Structure Diagram of Four Objects in Pymol

Figure 12 | Structure Diagram of Four Objects in Pymol

Based on the above analysis and taking into account factors such as protein expression efficiency and construction complexity, we ultimately chose to retain two CBM proteins in the fusion protein as the design scheme.

[1]Leonarski F, Jasiński M, Trylska J. Thermodynamics of the FourU RNA Thermal Switch Derived from Molecular Dynamics Simulations and Spectroscopic Techniques. Biochimie, 2019, 156: 22-32.
[2]Zayni S, Damiati S, Moreno-Flores S, Amman F, Hofacker I, Jin D, Ehmoser E-K. Enhancing the Cell-Free Expression of Native Membrane Proteins by In Silico Optimization of the Coding Sequence—An Experimental Study of the Human Voltage-Dependent Anion Channel. Membranes, 2021, 11(10): 741.
[3]Waldminghaus T, Heidrich N, Brantl S, Narberhaus F. FourU: A Novel Type of RNA Thermometer in Salmonella. Molecular Microbiology, 2007, 65: 413-424.
[4]Abramson J, Adler J, Dunger J, et al. Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3. Nature, 2024, 630(8016): 493-500.
[5]Pronk S, Páll S, Schulz R, et al. GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics, 2013, 29(7): 845-854.
[6]Evans R, O'Neill M, Pritzel A, et al. Protein Complex Prediction with AlphaFold-Multimer. bioRxiv, 2021: 2021.10.04.463034.
[7]Timmer B J J, Mooibroek T J. Intermolecular π–π Stacking Interactions Made Visible. Journal of Chemical Education, 2020, 98(2): 540-545.

Introduction

FourU Temperature Sensitive Riboswitch