Model | Patras - iGEM 2025

Introduction

Modeling biological and chemical systems is a very important aspect of synthetic biology. It allows a better understanding of such systems in silico, using mathematical and computational representations of real processes. This project is no different, as it explores all different processes of the biological mechanism we developed using software. All the code and results of the following models are available in our GitLab repository.

Biomarker Selection

Note: the words model and classifier, patient and sample, and biomarker/miRNA and feature are used interchangeably in the following analysis.

What Is An Ideal Sepsis Biomarker?

The features of an ideal sepsis biomarker:

Diagnostic tool with high sensitivity & specificity
Rapid results through accessible testing methods
Prognostic information for predicting patient outcomes
Cost effective and non-invasive* (1)

*measurable indicators derived from easily obtainable biological samples (2)

Conventional Sepsis Biomarkers

The most widely used biomarkers for sepsis diagnosis are C-reactive protein (CRP) and procalcitonin (PCT), both of which are produced by the body in response to infection or inflammation.

C-reactive protein (CRP) is a protein produced by the liver in response to inflammation. Its levels begin to rise within 12–24 hours, but may take up to 72 hours to reach their peak in the bloodstream. While CRP is moderately sensitive for detecting sepsis, it lacks specificity for bacterial infections, as many other inflammatory conditions can also elevate CRP levels. (3)

Procalcitonin (PCT), the precursor of calcitonin, is associated with severe bacterial infections. Its levels begin to rise 2–3 hours after infection and reach a peak at around 24 hours. Although this rise is faster than that of CRP, the delay in reaching peak concentration means PCT should be used cautiously as a sole marker for infection at initial presentation. Additionally, PCT may be less effective at distinguishing bacterial from viral infections (3).

Both CRP and PCT take several hours to rise to detectable levels. Yet in sepsis, every hour of delay in diagnosis can increase the risk of mortality by up to 10%.

Why miRNAs?

MicroRNAs (miRNAs) can be detected in various body fluids. In recent years, panels of deregulated miRNAs have been identified in the blood of patients with inflammatory and infectious diseases, suggesting that circulating miRNAs could serve as reliable biomarkers for sepsis (4). A major advantage of miRNAs is their remarkable stability in blood compared to other RNA molecules. This stability is achieved through encapsulation in lipid vesicles or by forming complexes with proteins, which protect them from degradation. As a result, circulating miRNAs can be found in biological fluids in several forms:

Exosomes
Microvesicles
High density lipoprotein particles. (5)

In addition to their stability, circulating miRNAs offer several advantages over conventional protein-based markers, including their small size and simpler chemical structure. MicroRNA levels can distinguish septic patients from healthy individuals and from those with other infectious diseases, making them promising diagnostic biomarkers for sepsis. Moreover, studies suggest that miRNAs may serve as prognostic biomarkers, providing insight into sepsis outcomes such as organ dysfunction and mortality (6). Because their levels correlate with disease severity and mortality risk, miRNAs have the potential to guide clinical decision-making in hospital settings.

Literature results

To identify potential sepsis-specific miRNA biomarkers, we conducted a PubMed search using the keywords: "sepsis" AND "microRNA biomarkers" AND "diagnosis OR prognosis." This search returned 244 results. From these, we focused on reviews, research articles, and clinical trials that reported extracellular (circulating) miRNAs detectable in blood, including plasma, serum, or whole blood.

To complement our literature review, we consulted the Human MicroRNA Disease Database (HMDD), a curated collection of experimentally supported associations between human miRNAs and diseases (7). We specifically explored the sepsis section of HMDD to identify circulating miRNAs that are correlated with bacterial sepsis.

Selection Criteria

We applied the following criteria to filter candidate miRNA biomarkers:

High diagnostic accuracy: The miRNAs showed significantly altered expression in sepsis patients compared to healthy controls and individuals with non infectious inflammatory conditions. They demonstrated high AUC values and higher than traditional biomarkers.
High specificity for bacterial sepsis: The LPS induced dysregulation, was a good indicator that the biomarker levels were altered due to bacterial sepsis.
Prognostic value (optional for each candidate, but prioritised): They provided information on disease progression and severity

During our search, we excluded studies or trials with a small number of participants and studies published before 2016

Based on these criteria we created a list of approximately 100 biomarkers from literature. For each candidate, we collected the following key information:

miRNA name
The biological fluid it was detected (eg.serum, plasma, whole blood)
Method of detection (eg. qRT-PCR)
Prognostic or/and diagnostic value
Number of patients and healthy volunteers that took part in the study
Their ability to discriminate between bacterial sepsis, healthy individuals and individuals with non infectious inflammation (AUC values)
The year of the published study they were mentioned

Multi Biomarker Approach

Sepsis is a complex condition, and a single biomarker cannot capture its full spectrum. By combining multiple biomarkers that reflect the immune response, inflammation, and organ dysfunction, we can achieve a more complete evaluation of the patient's condition. For this reason, we chose to focus on a panel of microRNAs rather than relying on just one, since no single microRNA could meet all our criteria. (1)

Method of Choice

A common way to identify the most relevant biomarkers for a health condition is to analyze datasets that include biomarker measurements from patients with known health labels. This analysis is typically carried out using three main approaches (8).

filter methods (statistical analysis)
wrapper methods (the model is treated as a black box)
embedded methods (the model is not treated as a black box)

Our team selected an embedded method, specifically by training and evaluating an XGBoost model. Embedded methods are advantageous because they consider the internal structure of the model when selecting biomarkers. We chose a machine learning approach over traditional statistical analysis because machine learning can capture complex patterns in the data and identify predictive biomarkers more effectively (9).

Dataset

To train our model our team utilized the GEO dataset GSE134358. It features 567 samples of the following categories:

sepsis patients
healthy controls
patients with noninfectious conditions
patients after LPS administration

For each sample (person), a set of features (miRNAs) is stored as a column in the dataset, along with a class label indicating the person's condition. These features contain preprocessed numerical readings from an Affymetrix Multispecies miRNA-4 Array. Specifically, they represent fluorescence intensity values, which are positively correlated with miRNA concentrations (10). In other words, higher fluorescence means higher miRNA expression, and lower fluorescence means lower expression. We initially considered other datasets, mainly from GEO, but decided against using them because their small sample sizes could lead to misleading results.

Training the Models

Our GitLab repository contains a directory called Biomarker Selection. Inside it, the data subdirectory stores the datasets and results, while the src subdirectory contains the Python source code. A README.md file is also included, providing an explanation of each file and instructions on how to run the code.

What follows is a step-by-step explanation of how we processed the data and trained the models.

1) Data Preprocessing

We built a python script called preprocessor.py which does the following.

removes non human miRNAs
removes patients that had been administered LPS
labels sepsis patients as 1
labels the rest of the patients as 0

This final step was carried out to identify the most sepsis-specific miRNAs. By converting the task into a binary classification problem, we classify sepsis patients against all other individuals. This approach allows us to find biomarkers that can distinguish sepsis patients not only from healthy individuals but also from those with other noninfectious conditions. The resulting dataset has the structure shown in the table below.

miRNA Feature	Patient 1 (Class: 0)	Patient 2 (Class: 0)	Patient 3 (Class: 1)
hsa-let-7a-5p	10.41757	10.46781	10.14589
hsa-let-7a-3p	0.4947102	0.5331665	0.4919643
hsa-let-7a-2-3p	0.6117588	0.6358627	0.8094975
hsa-let-7b-5p	12.50043	12.81568	11.0838
hsa-let-7b-3p	1.306408	0.7023731	0.6674352
hsa-let-7c-5p	10.79965	10.91543	9.463734
hsa-let-7c-3p	0.6028877	0.7009969	0.635748
hsa-let-7d-5p	10.74383	11.06884	10.28891
hsa-let-7d-3p	6.327692	5.598911	7.427999
hsa-let-7e-5p	4.732087	4.639012	3.472124
hsa-let-7e-3p	0.6482257	0.5502585	0.5448906
hsa-let-7f-5p	5.906273	6.251195	5.113054
hsa-let-7f-1-3p	0.9410619	0.9269541	0.8266135
hsa-let-7f-2-3p	0.5002547	0.50337	0.4851239
hsa-miR-15a-5p	6.240113	6.519817	5.116216
hsa-miR-15a-3p	0.5146033	0.5296417	0.4600508
hsa-miR-16-5p	12.2647	12.21791	11.65983

Sample of the preprocessed dataset

The first column lists the miRNA names (features), while the remaining columns correspond to individual patients. The first two patient columns represent non-sepsis individuals, indicated by a class label of "0," and the third column represents a sepsis patient, labeled as "1."

2) Feature Pruning

The dataset originally contained thousands of biomarkers, and training models using all possible combinations would have been extremely time-consuming. To address this, we created a preprocessing Python script called feature_pruner.py, which removes certain miRNAs from the dataset. It retains only those miRNAs supported by literature as potential sepsis biomarkers. This reduced the number of features from thousands to just over 100, ensuring that model training could be performed quickly and efficiently.

3) Feature Ranking

The final step in our analysis was to train and test the models to evaluate their performance on data they had not seen during training. This process is implemented in our Python script feature_ranker.py. As mentioned earlier, we used an embedded method with an XGBoost classifier. In simple terms, the model is trained on part of the data to learn how to detect sepsis and then tested on the remaining data to see if it can correctly classify samples as septic or non-septic. If a model trained on a specific pair of biomarkers performs best at distinguishing sepsis from non-sepsis, we consider those biomarkers the most informative. Following this approach, we trained models on each possible pair of miRNAs and evaluated their performance.

Before presenting the results, it is important to clarify what we mean by "evaluating" the models and how the evaluation metrics are defined. If you are already familiar with standard classifier evaluation metrics, you may skip the following subsection.

Evaluation Metrics

In our binary classification problem, we call sepsis the positive class (denoted by 1), and non-sepsis the negative class (denoted by 0). Our classifier can make the following predictions:

The patient has sepsis, but the classifier mistakes them for a non-septic. In this case we have a false negative (FN).
The patient does not have sepsis, but the classifier mistakes them for a septic. In this case we have a false positive (FP).
The patient has sepsis, and the classifier correctly classifies them as septic. In this case we have a true positive (TP).
The patient does not have sepsis, and the classifier correctly classifies them as non-septic. In this case we have a true negative (TN).

Using the above kinds of predictions we can define the following metrics:

$$\text{Sensitivity} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$

This is also called true positive rate (TPR).

$$1 - \text{Specificity} = 1 - \frac{\text{TN}}{\text{TN} + \text{FP}}$$

This is also called true negative rate (TNR) or Recall.

Using these metrics, we define our final evaluation metric as follows. Each binary classifier, when given input data, produces two output probabilities. For example, after receiving the miRNA values, a classifier might compute:

64% sepsis
36% no sepsis

Someone could argue that the class with a probability greater than 50% should be the one the patient is finally assigned to. In this case, the classification threshold is set at 50%. Using the example above, the patient would be classified as septic. If we vary the threshold across the full range from 0% to 100% and plot the True Positive Rate (TPR) against the True Negative Rate (TNR) for each threshold, we obtain a graph like the one shown below.

Example of a ROC curve

This graph is called a Receiver Operating Characteristic (ROC) curve, and the area under the curve (AUC) serves as our primary evaluation metric. An ROC-AUC score of 0.5 corresponds to a classifier making random guesses, while a score of 1.0 indicates perfect classification.

This metric balances false positives and false negatives. Another metric that also accounts for this balance is the F1-score, which is a more coarse grained evaluation metric.

$$\text{F1-score} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision} + \text{Recall}}$$

Where Precision is defined as:

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$

We evaluated our models and ranked them based on ROC-AUC score, but F1-score was also supplemented. It should be also noted that we trained and tested each model in different subsets of the data 10 different times and took the average results to validate them using a method called 10-fold cross validation (11). In the following paragraph we present the results.

Results

Classifiers trained on a single biomarker did not yield any good results, as only one exceeded 70% ROC-AUC score, while all had an F1-score lower than that. A sample of the assumed best biomarkers is the following.

Features	ROC-AUC	F1-Score
hsa-miR-150-5p	0.722881944	0.674741979
hsa-miR-4516	0.688625	0.662728039
hsa-miR-23a-3p	0.681507937	0.661885588
hsa-miR-1275	0.679685516	0.63736992
hsa-miR-342-5p	0.678014881	0.662375203
hsa-miR-345-5p	0.688625	0.599949907
hsa-miR-320d	0.667548611	0.629510125
hsa-miR-320e	0.657581349	0.632991433
hsa-miR-342-3p	0.650649802	0.615980718
hsa-miR-27a-3p	0.649034722	0.576764847
hsa-miR-320c	0.648060516	0.637588633
hsa-miR-143-3p	0.636545635	0.58550889
hsa-miR-26a-5p	0.628102183	0.591701753
hsa-miR-301b	0.623758929	0.616814158
hsa-miR-1270	0.620993056	0.630120736

The top 15 biomarkers, ranked based on ROC-AUC

Features	ROC-AUC	F1-Score
hsa-miR-26a-5p, hsa-miR-150-5p	0.823513889	0.79020406
hsa-miR-150-5p, hsa-miR-193a-3p	0.822483135	0.757751207
hsa-miR-25-3p, hsa-miR-150-5p	0.821015873	0.759504259
hsa-miR-150-5p, hsa-miR-30e-3p	0.817053571	0.775450652
hsa-miR-150-5p, hsa-miR-1273g-3p	0.815730159	0.780830344
hsa-miR-22-3p, hsa-miR-150-5p	0.804328373	0.763105581
hsa-miR-150-5p, hsa-miR-1273h-3p	0.804154762	0.776680229
hsa-miR-150-5p, hsa-miR-1273h-5p	0.802014881	0.757627115
hsa-miR-150-5p, hsa-miR-151b	0.801152778	0.765033135
hsa-miR-150-5p, hsa-miR-486-5p	0.798154762	0.742475719
hsa-miR-30c-5p, hsa-miR-150-5p	0.797555556	0.742076225
hsa-miR-26b-5p, hsa-miR-150-5p	0.795752976	0.747775988
hsa-miR-150-5p, hsa-miR-1469	0.789861111	0.771621626
hsa-miR-150-5p, hsa-miR-1587	0.789134921	0.731521045
hsa-miR-150-5p, hsa-miR-1915-3p	0.788708333	0.790036584

The top 15 pairs of biomarkers, ranked based on ROC-AUC

Making 6 or 7 correct guesses out of 10 is not sufficient for a diagnostic tool, so we needed to increase the number of biomarkers to 2 so that we could potentially see better results.

As we expected, training the models in combinations of two biomarkers did yield better results, as a combination is always better at capturing complex patterns.

The scores were similar; therefore, the final choice of biomarkers was ultimately based on the available literature.

Final Choice

From the candidate pairs of microRNAs, we selected hsa-miR-486-5p and hsa-miR-150-5p. This pair ranked within the top 15 of our analysis and achieved an AUC of ~80%, meeting all of the criteria we set. READ MORE

miR-486-5p

Demonstrated higher diagnostic accuracy and AUC values compared to standard biomarkers such as PCT and CRP.
Provides insight into disease severity.
Levels are elevated in the blood during sepsis, and can effectively discriminate between septic patients, pneumonia patients, and healthy individuals, making it a highly specific biomarker.

miR-150-5p

Consistently found to be downregulated in sepsis patients.
Its expression is induced by LPS, making it a potential indicator of Gram-negative bacterial sepsis.
Also associated with disease severity, providing prognostic value in addition to diagnosis.

Optimized Models

To demonstrate the effectiveness of our selected biomarkers, we built several optimized machine learning models that outperform the previous models in diagnosing sepsis. Specifically, we trained a Support Vector Machine (SVM), a Multilayer Perceptron (MLP), a Random Forest (RF), and an XGBoost (XGB) model.

All models have parameters that need to be fine-tuned to achieve optimal performance. To do this, we trained the models using many combinations of parameters in a process called grid search (11). The performance of each model was then evaluated using 5-fold cross-validation.

We trained the models twice: once using F1-score as the evaluation metric, which balances false positives and false negatives, and once using recall, which prioritizes minimizing false negatives. We focused on reducing false negatives because failing to diagnose a sepsis patient could have more severe consequences than incorrectly flagging a non-septic patient.

The ranking based on F1-score is shown below.

Metric	SVM (Rank 1)	Random Forest (Rank 2)	XGBoost (Rank 3)	MLP (Rank 4)
Optimization Metric	F1	F1	F1	F1
Ranking Metric	F1_Score	F1_Score	F1_Score	F1_Score
Recall Mean	0.936895161	0.873387097	0.86733871	0.898991935
Recall Std	0.034057665	0.056165027	0.066338305	0.082534324
Precision Mean	0.737003499	0.774409708	0.775342017	0.741090987
Precision Std	0.049784419	0.060976004	0.049655815	0.057593696
F1 Score Mean	0.823291757	0.817053676	0.815079625	0.808778281
F1 Score Std	0.027526948	0.015750256	0.015168231	0.044275725

The ranking based on recall is shown below.

Metric	SVM (Rank 1)	XGBoost (Rank 2)	MLP (Rank 3)	Random Forest (Rank 4)
Optimization Metric	Recall	Recall	Recall	Recall
Ranking Metric	Recall	Recall	Recall	Recall
Recall Mean	1	0.911491935	0.898991935	0.886290323
Recall Std	0	0.060927139	0.082534324	0.041877286
Precision Mean	0.528418079	0.715250835	0.728270474	0.741250378
Precision Std	0.006626326	0.051313217	0.040116177	0.067018157
F1 Score Mean	0.691432818	0.798422362	0.801321777	0.803643256
F1 Score Std	0.005695195	0.026796728	0.032947203	0.023610895

The SVM model managed to reach a recall of 1, meaning that it made no errors in detecting sepsis. It did have some false positives but no false negatives at all, which is what we wished to achieve. All the other models have a recall of approximately 90%. What is also noteworthy is that the SVM, even though it detected all sepsis patients, it had a low precision of 50%, meaning that it classified many non-septic patients as septic.

Decision Boundaries

The reason we trained all those different models is due to their different approaches at computing the final result. To give you a better understanding of the way decisions are made, we made a python script called "decision_boundary_visualizer.py" which produces the images shown in the next subsection. What you see is the decision boundaries that the models compute to make the final classification.

In the images below, you see a grid where the horizontal and vertical axes represent the hsa-miR-150-5p and hsa-miR-486-5p correspondingly. The dots on the grid represent the patients. Their location reflects the values of the measured miRNAs. The color of the dots represents their class (red/blue = sepsis/non-sepsis) and the color of the grid behind them represents how confident the model is that the sample belongs to the positive class (sepsis). Finally, the dashed grey line is the decision boundary. This is the area on the grid where the confidence of the model is 50%. If a sample lands on one side of the boundary it gets classified as the positive class, while if it lands at the other side, it gets classified as the negative class.

MLP decision boundary

SVM decision boundary

Random Forest decision boundary

XGBoost decision boundary

As demonstrated above, even classifiers with similar scores had much different decision boundaries.

Explainability

AI models need to be intuitively understandable to be safely used in important decision-making. This is why explainability methods such as SHAP (12) were developed: they allow us to see how the model makes its decisions and how each training sample influences the prediction toward the positive or negative class.

The figure below is a SHAP plot, which ranks biomarkers by their impact on the model's predictions. Biomarkers located higher on the vertical axis, such as hsa-miR-150-5p, have a greater influence on the model's decisions. Each dot represents a patient, with the color indicating the level of the biomarker. The horizontal position of the dot shows how much that patient pushed the model's prediction toward the negative or positive class. For example, a blue dot on the right of the axis indicates a patient with a low level of that biomarker, which pushed the model toward predicting the positive class.

SHAP plot for Random Forest

SHAP plot for XGBoost

In the above models, we see a clear overlap between the blue and the red dots, meaning that the models cannot explain consistently how the samples drive the predictions.

SHAP plot for MLP

SHAP plot for SVM

In the above models, however, we can clearly see that the lower values pushed the prediction towards the positive class and that the higher ones did the opposite.

It should be noted that the dataset we used contains heavily processed data, which do not represent miRNA expression in a strictly linear way. The values were scaled and transformed before inclusion in the dataset. This explains why some samples show both biomarkers as downregulated in sepsis, even though, in theory, one biomarker would be expected to be upregulated while the other is downregulated.

Final Comments

As demonstrated in the previous sections, two biomarkers stand out in classifying sepsis patients. The optimized models we described achieve close to 90% in the evaluation metrics, and some provide interpretable results, as shown by the SHAP plots and visual decision boundaries. While a single biomarker may not have been sufficient, using two led to a noticeable improvement. It is important to note that the mechanism we are developing is not a single diagnostic tool, but rather a template method for diagnosis. This approach can be extended to include additional biomarkers, potentially further increasing classification accuracy.

For more details on how our biomarkers function, the reader can refer to the proof of concept. READ MORE

Evaluation of CRISPR/Cas system's sequences

In a CRISPR/Cas system, the guide RNA (gRNA) is a short RNA molecule that directs the Cas protein to a specific DNA or RNA target. It contains a sequence complementary to the target, allowing precise recognition through base pairing, and a structural portion that binds the Cas protein. It ensures that the Cas protein cuts or binds only at the intended location, providing the system with high specificity for target detection.

Choosing The Best guideRNA

We evaluated the candidate guide RNAs using three complementary thermodynamic metrics.

Minimum free energy (MFE, by RNAfold): measures folding stability of the guide alone. More negative values imply a more stable secondary structure that is less likely to separate the spacer. (13)
GC content: It shows the percentage of G/C base pairs. Higher GC content generally increases stability because G–C pairs form three hydrogen bonds (versus two for A–U), so GC regions tend to strengthen local base-pairing. (14)
Heterodimer MFE (guide:miRNA, by RNAcofold): estimates the free energy of the guide–target bind. A lower (more negative) heterodimer MFE indicates a tighter, more favourable crRNA–miRNA pairing and therefore a higher likelihood of efficient target engagement. (15)

Together, these three metrics provide insight into the guide's self-folding behaviour and the stability of its binding to the biomarker. This approach offers a practical means of prioritizing guides whose structures are most effective at target recognition. (13)

RNAfold Outputs

Using thermodynamic analysis with ViennaRNA (16), we determined that the full-length crRNA is the optimal sequence for detecting both 486-5p and 150-5p. The metrics described above were measured using RNAfold and RNAcofold.

Potential guide RNAs (17)(18)

Potential guide RNA structure for miR-150-5p (guide only):

Guide RNA 1

Guide RNA 2

Guide RNA 3

Guide RNA full length

Potential guide RNA structure for miR-486-5p (guide only):

Guide RNA 1

Guide RNA 2

Guide RNA 3

Guide RNA full length

In addition to assessing the stability of the individual RNA guide, we also needed to evaluate the stability of each guide when paired with its corresponding biomarker.

RNA Fold and RNA coFold Results are shown here:

miR-150-5p — guide ↔ miRNA duplex

crRNA	MFE (guide-only) (kcal/mol)	GC content	MFE heterodimer (kcal/mol)
crRNA1	− 12.10	45.45%	−38.6
crRNA2	−10.80	46.43%	−39.00
crRNA3	−11.80	46.30%	−42.60
crRNA full length	−11.00	46.55%	−49.90

Ranking The guideRNAs Based On The Stability of The Complex

All four guide RNAs exhibited similar self-folding stability, with RNAfold MFE values around −11 to −12 kcal/mol. This suggests that intramolecular steric hindrance does not significantly interfere with complex formation (13)(15). To select the optimal guide for our experiments, we compared the heterodimer MFE values generated by RNAcofold. The ranking of guides based on these values is: Full-length (−49.90) > crRNA-3 (−42.60) > crRNA-2 (−39.00) > crRNA-1 (−38.60). Based on this analysis, we selected the full-length guide as our primary candidate for downstream 3D modeling, docking, and validation, with crRNA-3 serving as a strong alternative.

All four crRNAs formed strong duplexes with miR-150-5p. Among them, the full-length crRNA showed the most favourable ΔG (−49.90 kcal/mol), making it our top choice for further analysis. We therefore selected the full-length crRNA as the primary guide for downstream 3D modeling and docking experiments.

miR-486-5p — guide ↔ miRNA duplex

crRNA	MFE (guide-only) (kcal/mol)	GC content	MFE heterodimer (kcal/mol)
crRNA1	− 8.90	0.64	−50.60
crRNA2	− 11.90	0.64	−48.60
crRNA3	−10.80	0.59	−49.93
crRNA full length	−10.40	0.64	−54.10

Similarly, the miR-486-5p guide passed the RNAcofold screen, indicating that no problematic secondary structures form that could interfere with complex binding. Based on heterodimer MFE values, the ranking is: full-length (−54.10) > crRNA1 (−50.60) > crRNA3 (−49.93) > crRNA2 (−48.60). We are moving all candidates forward to structural modeling, designating the full-length crRNA as our lead and crRNA1 as the secondary choice. As with the miR-150-5p set, the guide-only MFE and GC content are very similar across all designs, so the heterodimer MFE is the key selection parameter. More negative ΔG values suggest more stable hybridization and a greater likelihood of Cas13 activation. Based on these thermodynamic predictions, the full-length spacer is expected to provide the highest duplex stability, making it the best candidate for 3D docking and experimental validation. From this analysis, we concluded that the full-length guide RNAs are the optimal candidates for both miR-150-5p and miR-486-5p.

How We Obtained The Protein

The first step in our molecular docking workflow was to obtain the PDB structure files for both the receptor and the ligand. To achieve this, we downloaded the PDB structure of the LwaCas13a protein (1152 amino acids) from UniProt (19).

Obtained file confidence levels comments based on AlphaFold (22) prediction.

We retrieved the LwaCas13a structural model from UniProt and evaluated its per-residue confidence (pLDDT) profile to ensure its suitability for docking studies. The model has a mean pLDDT of 78.98 (range: 31.13–96.57), with 244 residues in the very-high-confidence category (>90), 675 residues in the confident range (70–90), and 263 residues scoring below 70. As expected for a large ribonuclease like Cas13a, the low-confidence regions are primarily found in long loops and terminal segments. In contrast, the active site and the HEPN/NUC lobe helices show high-to-moderate confidence (70–90), suggesting reliable structural predictions in these functionally critical areas. Importantly, low pLDDT scores do not indicate misfolding but rather reflect structural flexibility, meaning these regions may adopt multiple conformations under different conditions.

Finally, most of the remaining secondary structures fall within the high-to-moderate confidence range (70–90), providing sufficient reliability for use in our docking and structural analysis (21).

3D Structures And How We Chose the Best One

To generate 3D structures for docking, we converted the predicted crRNA into a three-dimensional model using the 3dRNA server (23). This tool integrates sequence and secondary-structure information to produce a set of candidate 3D models. From the output, we selected the top-scoring model within the largest cluster, as it represents the most probable and energetically stable conformation for use in our downstream docking studies (23).

Representations of Modeling

The selected model was exported as a PDB file and visualized in PyMOL. PyMOL, a widely used molecular visualization tool, allowed us to inspect the structure, check its geometry, and verify that all atoms were present and correctly placed before proceeding to molecular docking.

Guide for miR-150-5p

Predicted miR-150-5p

Predicted miR-486-5p

Results

Scores by 3dRNA/DNAscore (the lower the better)

Model	1	2	3	4	5
Score	31.5512	31.5512	31.5512	31.5512	31.5544

Scores by 3dRNA/DNAscore (the lower the better)

Model	1	2	3	4	5
Score	30.2539	30.2539	30.2539	30.2539	30.442

For miR-150-5p, the top cluster (Cluster-1) out of five predicted models produced the lowest and most consistent scores (≈31.5512, with a minimal spread up to 31.5544). The identical scoring across these structures indicates strong convergence, meaning that the sequence consistently folds into the same 3D conformation. We further confirmed that Cluster-1 exhibits a standardized, natural form with canonical geometry by visualizing it in PyMOL. Based on its low scores, structural consistency, and proper geometry, we selected Cluster-1 as the optimal structure for docking. Its PDB file was therefore used as the ligand model in all downstream docking studies with LwaCas13a.

Similarly for the miR-486-5p predictions showed similar results. More specifically cluster-1 models produced the best score values (top decoys ≈ 30.2539, with the largest outlier at ≈ 30.442). PyMOL visualization of Cluster-1 model also revealed a generally a structure suitable for docking.

In conclusion, the combination of (i) the lowest cluster score, (ii) a tight score distribution among the top decoys, and (iii) clean visual confirmation supports that Cluster-1 for both miRNAs provides a reliable and well-defined structural starting point for docking studies.

Molecular Docking Process

Protein Guide Binding in HDOCK

We performed protein–RNA docking using the HDOCK server (24, 26) to model the interaction between the full-length crRNA and the LwaCas13a protein.

What do we measure and why?

Each docking run was assessed using two metrics: the HDOCK docking score and the confidence score metric. (26)

HDOCK uses a knowledge-based scoring function optimized for protein–protein and protein–RNA/DNA interfaces. More negative docking scores correspond to more favourable interaction energies according to this scoring potential (25)(26). Benchmark studies have shown that this hybrid approach achieves competitive success rates compared to other docking servers. Confidence score (26) that indicates the binding likeliness of two molecules is defined as follows.

$$\text{Confidence score} = \frac{1.0}{1.0 + e^{0.02(E_{\text{docking}} + 150)}}$$

In general, a confidence score above 0.7 indicates a high likelihood of binding between the two molecules. Scores between 0.5 and 0.7 suggest a possible interaction, while scores below 0.5 indicate that binding is unlikely.

Across the docking simulations, the full-length crRNA selected from the evaluation phase exhibited a favorable profile. It achieved low (negative) docking scores and showed minimal RMSD variation among its top-ranked poses. This combination of favorable interaction energy and strong pose convergence supports the selection of the full-length spacer as the lead candidate for ternary modeling. All predicted binding structures are summarized in the table below.

LwCas13a/crRNA complex (for miR-150-5p)

Summary of the Top 10 models

Rank	1	2	3	4	5	6	7	8	9	10
Docking score	-312.95	-277.49	-270.20	-270.04	-266.45	-255.86	-254.89	-246.70	-246.55	-240.94
Confidence score	0.9630	0.9276	0.9171	0.9169	0.9113	0.8926	0.8907	0.8737	0.8734	0.8604
Ligand rmsd	51.23	54.73	57.14	58.04	45.54	53.83	61.06	44.60	60.37	72.82

LwCas13a/crRNA complex (for miR-486-5p)

Summary of the Top 10 models

Rank	1	2	3	4	5	6	7	8	9	10
Docking score	-333.72	-295.84	-287.78	-286.26	-285.15	-283.59	-279.43	-273.60	-269.74	-269.10
Confidence score	0.9753	0.9487	0.9402	0.9385	0.9372	0.9353	0.9301	0.9222	0.9164	0.9154
Ligand rmsd	75.33	69.94	61.34	67.61	71.13	61.72	68.76	68.40	69.63	63.15

As shown in our tables, the full-length crRNA exhibits strongly negative docking scores along with high confidence in binding. This combination of favorable interaction energy and consistent pose convergence aligns with the recommended criteria for prioritizing candidates for further refinement and experimental testing, as outlined in docking server protocols and evaluation guidelines.

Refinement and creation of the ternary complex

Most proteins feature a stable folded domain while their terminal regions or surface loops are more conformationally variable. Because HDOCK applies a rigid-template docking strategy (24)(29), it tends to miscalculate binding energetics in systems where RNA-protein flexibility plays a major role. To better capture these effects, we employed HADDOCK to perform semi-flexible docking on the LwaCas13a: full-length-CrRNA complex, enabling side-chain adjustments and limited backbone rearrangements during refinement (28). This approach helped us calculate the interaction network and produced models that more accurately reflect induced fit and local flexibility than the initial rigid predictions.

Ternary Complex Results and Lobe Visualization

Lobe Visualization of LwCAS13a

For structural insight, we rebuilt and visualized the LwaCas13a model in PyMOL. The LwaCas13a adopts a bilobed architecture composed of a Recognition (REC) lobe and a Nuclease (NUC) lobe. The REC lobe is made up of the N-terminal domain (NTD) and the Helical-1 subdomain while the NUC lobe contains the HEPN1 domain, HEPN2 domain, Helical-2 domain, and a Linker between two HEPN domains. In our docked models, the crRNA primarily interacts with the surface of the REC lobe. After the binding of the target RNA the structure undergoes conformational changes that position the HEPN catalytic motifs for activity, thereby enabling Cas13a's characteristic trans-cleavage function. Schematic diagrams illustrating this domain organization are provided below. (27) (28) (35)

Code that explains how we performed the described process is found in our GitLab repository in a directory with the same name as the current section.

LwCas13a Visual

Final Ternary complex

We then used HADDOCK to build and evaluate the full ternary assemblies with each biomarker (miR-150-5p and miR-486-5p) (30, 31). In both cases, HADDOCK produced fully formed ternary models in which the full-length crRNA binds to Cas13a, while the target miRNA aligns and pairs along the guide region. Overall, the highest-ranked models show well-packed interfaces with numerous specific contacts between the Cas13a protein, the guide RNA, and the target RNA. These results are highly reassuring, as the consistent contact patterns and molecular geometries observed across top solutions validate the thermodynamic predictions discussed earlier.

The ternary complex was successfully assembled, and the results are shown below:

Ternary complex of miR-150-5p

Cluster 1

Parameter	Value
HADDOCK score	-42.3 +/- 15.6
Cluster size	7
RMSD from the overall lowest-energy structure	9.5 +/- 0.3
Van der Waals energy	-105.1 +/- 11.7
Electrostatic energy	-182.8 +/- 42.2
Desolvation energy	46.9 +/- 10.5
Restraints violation energy	524.7 +/- 75.0
Buried Surface Area	3077.3 +/- 285.2
Z-Score	1.4

Ternary complex of miR-486-5p

Cluster 1

Parameter	Value
HADDOCK score	113.6 +/- 9.8
Cluster size	5
RMSD from the overall lowest-energy structure	7.5 +/- 3.2
Van der Waals energy	-157.5 +/- 12.8
Electrostatic energy	-681.2 +/- 130.5
Desolvation energy	69.7 +/- 4.5
Restraints violation energy	3376.3 +/- 197.2
Buried Surface Area	5028.5 +/- 301.0
Z-Score	-1.4

In HADDOCK, the docking score is a weighted sum of van der Waals (E_vdW) and electrostatic (E_elec) energies, where more negative values generally indicate a more favorable complex (32). The buried surface area (BSA) reflects the size of the interface formed, with larger values typically corresponding to more stable binding (33). Cluster size and internal RMSD indicate the reproducibility of the pose across top models (33), while the Z-score compares a cluster to the rest of the run, with more negative values signifying better-than-average performance (34).

miR-150-5p ternary complex: Cluster 1 was identified as the best cluster, exhibiting a favorable overall energy profile (E_vdW ≈ −105, E_elec ≈ −183) and a substantial interface (BSA ≈ 3,077 Å²). The cluster is reasonably reproducible, with a size of 7 and RMSD ≈ 9.5 Å, which is acceptable given the large size of the Cas13a–RNA assembly. The Z-score (+1.4) is not optimal, indicating that this cluster is not significantly better than the run's average by that metric. Additionally, some restraint tension is present, suggesting that while this cluster is plausible, it should be considered a "work-in-progress" for further refinement.

miR-486-5p ternary complex: For this complex, the interface appears even stronger, with E_vdW ≈ −158 and E_elec ≈ −681. The cluster is reasonably tight (size 5, RMSD ≈ 7.5 Å), and the Z-score (−1.4) is reassuring, indicating it performs better than the ensemble average. The buried surface area (BSA) is large and consistent (~5,029 Å²), suggesting a substantial interface. The main caution is the relatively high restraint-violation energy, indicating that the model could benefit from brief refinement. Overall, this ternary complex appears stronger than the miR-150-5p complex. Taken together, these metrics support a credible and reliable ternary complex, reinforcing the validity of our predictions.

Finally, we observe the interactions between the Cas13a lobes and the guide RNA in our final ternary complex:

Final ternary complex

REC contact -> A:494 GLU Iterate: iterated over 210 atoms. NUC contact -> A: 1024 TYR Iterate: iterated over 115 atoms.

Per-lobe interaction summary

Contact mapping in PyMOL shows that the Recognition (REC) lobe dominates interactions with the guide RNA: we observe 210 close atom–atom contacts between REC and the guide versus 115 between the NUC lobe and the guide . These results indicate that the REC lobe provides the primary recognition surface for the crRNA in the docked model, which is consistent with the canonical role of the REC region in many Cas proteins and with previous structural studies describing REC-dominant contacts.

Guide/Target Surface Exposure

At the same time, solvent-accessible surface area calculations for the target chain give 5640.38A² for the isolated target and 5601.69 A² for the target in the complex . That means that the buried area is only ≈38.7A meaning that the area that the RNA binds to is in comparison very small . In structural terms the target remains largely solvent-exposed in the snapshot examined, even though multiple atoms make direct contacts. Taken together, the pose-level metrics are encouraging: the REC lobe clearly engages the guide and multiple guide nucleotides form contacts with the target. These features support a biologically plausible ternary assembly. At the same time, the very small buried surface area in this snapshot warns that the interface in this top-ranked pose is relatively small.

486-5p

PyMOL>select REC_lobe, 486-5p-1 and polymer.protein and resi 1-576 Selector: selection "REC_lobe" defined with 6178 atoms. PyMOL>select NUC_lobe, 486-5p-1 and polymer.protein and resi 577-1452 Selector: selection "NUC_lobe" defined with 6438 atoms. PyMOL>select guideRNA, 486-5p-1 and chain B and resn rA+uA+cA+gA Selector: selection "guideRNA" defined with 1809 atoms. PyMOL>select REC_guide_contacts, REC_lobe within 4.0 of guideRNA Selector: selection "REC_guide_contacts" defined with 114 atoms. PyMOL>select NUC_guide_contacts, NUC_lobe within 4.0 of guideRNA Selector: selection "NUC_guide_contacts" defined with 108 atoms.

486-5p lobe analysis

Analysis of the 486-5p ternary pose reveals a protein-centric recognition pattern with a modest guide–target interface in the inspected snapshot. PyMOL contact shows 222 protein–guide contacts and 54 direct guide–target contacts, indicating that the Cas13a provides the principal binding surface to position the crRNA. Additionally, we observe 114 contacts between the guide and the REC lobe, compared to 108 contacts with the NUC lobe, further indicating that the REC lobe primarily determines guide RNA binding. Same as above the isolated target is 5763.29 A² ,,in-complex SASA = 5740.79 Å² and the buried area is ≈ 22.5 Å². Again that means that the per-pose buried surface area for the target is small suggesting that in this top-ranked pose the target remains solvent-exposed and the interface is relatively shallow.

Conclusions

Taken together, our multi-stage in silico evaluation—including thermodynamic ranking (MFE, GC content, heterodimer MFE), 3dRNA-derived ligand selection, docking with HDOCK, semi-flexible refinement with HADDOCK, and pose-level visualization in PyMOL—supports that the full-length guide RNAs for both miR-150-5p and miR-486-5p exhibit the most favorable hybridization energetics and the most reproducible docking solutions. The resulting ternary complexes show that LwaCas13a binds the crRNA while the spacer region pairs with the target miRNA. Furthermore, HADDOCK cluster and energy scores, together with detailed interaction analyses between protein, guide, and target RNA, indicate that the Cas13a:crRNA:miRNA complex is structurally plausible and likely to form under experimental conditions.

Collateral Cleavage and Activation

We divided this category into two systems for clarity. The first system models the process of Cas13a binding to the guide RNA, followed by target RNA recognition, activation, and finally the collateral cleavage of Hairpin 0 (H0). The goal of this system is to illustrate the entire process from start to finish using standard kinetic equations. In the second system, we assume that the Cas13a–crRNA–target complex is already active. This model focuses on the rate at which Hairpin 0 (H0) is converted into an initiator and the target RNA is cleaved. These rates are described using Michaelis–Menten kinetics.

For both systems, parameter values were determined based on a review of relevant literature. Due to the physical constraints of our current lab equipment, it was not feasible to obtain real-world measurements experimentally. All supporting references are provided in the bibliography.

Explaining the parameters of System 1

First, it is essential to define the parameters used in our model. Each parameter represents a distinct factor that will later be applied to solve the kinetic equations. The values for these parameters were determined through bibliographic research and, where possible, laboratory experiments. The parameters are analyzed and defined as follows:

I: Initiator released when the reporter is cut
C: Free Cas13a enzyme before it binds to anything
E: Cas–guide ribonucleoprotein (Cas13a bound to guide RNA)
Es: RNP bound to the target RNA (the "activated" complex)
Target_initial: Intact target RNA
Target_after: Target that has been cleaved
H0: Intact reporter hairpin
G: free guide RNA

We also do the same process for some basic kinetics parameters:

Bind_Crisp_Guide: Bimolecular association rate for C + G → E
Diss_Crisp_Guide: Unimolecular dissociation rate for E → C + G
Bind_E_Target: Association rate for E + Target_initial → Es
Diss_E_Target: Dissociation rate for Es → E + Target_initial
Target_cleavage: Cis-cleavage rate at which Es cuts its bound target, converting Target_initial to Target_after
Reporter_cleavage: Collateral (trans) cleavage rate at which Es cuts the reporter H0 to generate I
g: Generic first-order loss/degradation/dilution rate

Setting the values of the parameters of System 1

Our model is composed of eight parameters in total. For species that are essentially absent at the start of the system—namely the initiator, the Cas13a–guide complex (E), the activated complex (Es), and the cleaved target—we assigned near-zero initial values (0.1 nM) to represent their appearance only after the reaction begins. The remaining parameters were initialized as follows: C = 50 nM, H0 = 300 nM, Target_initial = 10 nM, and G = 60 nM (36)(37)(38)(39)(40)(41)

Setting the equations of System 1

With our system parameters defined and initialized, we proceeded to specify the equations used for modeling. The primary objective at this stage is to describe how the signal concentration changes over time, allowing us to capture the dynamics of the reaction.

For the initiator I, the rate equation:

$$\frac{dI}{dt} = (Reporter_{cleavage} \times Es \times H0) - (g \times I)$$

This equation captures the balance between production and loss of the initiator (I). The value of I increases whenever the activated enzyme–target complex (Es) cleaves the reporter (H0). The rate of production is therefore proportional to the concentrations of both Es and H0—the more of each present, the faster I is generated. At the same time, I gradually decreases due to its natural loss or degradation. Overall, I increases when its production rate exceeds its loss rate, and decreases when the loss rate becomes dominant.

The rate equation:

$$\frac{dC}{dt} = (Diss_{Crisp_{Guide}} \times E) - (Bind_{Crisp_{Guide}} \times C \times G) - g \times C$$

The rate equation for free Cas13a (C) shows that its concentration increases when the RNP complex (E) dissociates into C + G, and decreases when free Cas13a binds to the guide RNA (G) to form the E complex. Additionally, C decreases due to a generic loss term (g). In summary, higher levels of E lead to more C being released through dissociation, while higher levels of G accelerate the consumption of C to form E. When G = 0, no new E can form, and C can only increase through E dissociation or decrease through g. Similarly, if E = 0, no additional C is produced via dissociation.

$$\frac{dE}{dt} = (Bind_{Crisp_{Guide}} \times C \times G) - (Diss_{Crisp_{Guide}} \times E) - (Bind_{E_{Target}} \times E \times Target_{initial}) + (Diss_{E_{Target}} \times Es) + (Target_{cleavage} \times Es \times H0)$$

According to this equation, the concentration of the Cas13a–guide complex (E) increases when free Cas13a (C) and guide RNA (G) bind, when the activated complex (Es) releases the target, and when a bound target is cleaved. Conversely, E decreases when it dissociates back into C + G and when it is sequestered by binding to an uncut target to form Es.

$$\frac{dEs}{dt} = (Bind_{E_{Target}} \times E \times Target_{initial}) - (Target_{cleavage} \times Target_{initial} - Diss_{E_{Target}} \times Es) + (Reporter_{cleavage} \times Es \times H0)$$

The activated complex (Es) is formed when the free RNP complex (E) binds an intact target RNA. Es decreases in two ways: first, through cis-cleavage of the bound target, which converts the target and releases the enzyme; and second, through simple unbinding of the complex. An additional term accounts for Es catalyzing reporter cleavage. While the reaction rate depends on both Es and the reporter (H0), Es itself is not consumed in this catalytic step and persists throughout the reaction.

$$\frac{dTarget_{initial}}{dt} = (Diss_{E_{Target}} \times E) - (Bind_{E_{Target}} \times E \times Target_{initial}) - (g \times Target_{initial}) - (Target_{cleavage} \times Target_{after})$$

The concentration of intact target RNA increases when the enzyme–target complex (Es) dissociates and releases the target. It decreases when the RNP complex (E) binds the intact target to form the activated complex (Es), through generic loss (g), and via cis-cleavage, which converts intact target into the cleaved pool.

$$\frac{dTarget_{after}}{dt} = Target_{cleavage} \times Es$$

When the activated complex (Es) binds a target, it cleaves the target at a per-complex rate. Each cleavage converts one intact target into one molecule of cleaved target (Target_after), so the production rate is proportional to the concentration of Es. If Es = 0, no new cleaved target is produced, while higher levels of Es lead to faster accumulation of Target_after.

$$\frac{dG}{dt} = (Diss_{Crisp_{Guide}} \times E) - (Bind_{Crisp_{Guide}} \times C \times G) - (g \times G)$$

The concentration of free guide RNA (G) increases when the RNP complex (E) dissociates, as this releases one guide molecule per event. G decreases when it binds to free Cas13a (C) to form E, following mass-action kinetics and thus proportional to both C and G. Additionally, G slowly decreases due to generic loss (g). If E = 0, no new guide is released, and if C = 0, guides cannot be consumed by binding.

$$\frac{dH0}{dt} = -(Reporter_{cleavage} \times Es \times I) - (g \times H0)$$

In simple terms, the reporter (H0) only decreases over time. It is consumed when the activated complex (Es) cleaves reporter molecules and also undergoes generic first-order loss at rate g. If either Es or the initiator (I) is zero, the cleavage-driven consumption stops, and H0 changes only due to the g term. If both Es and I are zero (and g = 0), H0 remains constant.

System 1 Results

Results of system 1

The simulation reveals clear trigger dynamics. After a brief induction period, the activated complex (Es) rises sharply to a narrow peak, during which the Hairpin-0 (H0) reporter is almost completely consumed. Once the reporter is depleted, all reaction fluxes collapse toward zero, and the system enters a quiescent state with minimal further change. Effectively, nearly the entire measurable response occurs during this early, transient burst.

Under the specified starting concentrations (C = 50 nM, G = 60 nM, Target_initial = 10 nM, H0 = 300 nM), free Cas13a and guide RNA remain essentially unchanged during the transient phase, indicating that they are present in surplus and not rate-limiting. In contrast, the limiting resource is the reporter (H0). The target is rapidly converted to its cleaved form during the pulse, and initiator production occurs only while H0 is available. Additional incubation after reporter depletion yields little or no extra signal.

System 1 Code

In our code, we define the system variables and pass them to the vectors z and param. Specifically, z contains the initial concentrations, while param holds the rate constants. Time is defined as t in the solver. The derivatives are then computed using these vectors, capturing the dynamics of the system. The final output is a diagram that illustrates the full process from start to finish. The code is available in our GitLab repository in the "Collateral Cleavage and Activation" directory.

Explaining The Parameters of System 2

As mentioned earlier, this system assumes that the Cas13a–guide RNA–target RNA complex is already active and ready to perform its ribonuclease activity. To model this, we used Michaelis–Menten equations with the parameters k₁, k₂, k₃, and k_m. Here, k₁ represents the association rate of the E complex with the target to form the active complex Es, k₂ is the dissociation rate of the active complex, and k₃ describes collateral cleavage driven by Es. K_m is the Michaelis constant. The other parameters (I, H0, Target_initial, and Target_after) remain unchanged, as described previously.

Setting The Parameters of System 2

We follow the same procedure as in System 1, specifically initializing I, Target_after, E, and K₁ to near-zero values for the reasons discussed previously. The remaining parameters were set as follows: H0 = 100 nM, K₂ = 10 s^-1, K₃ = 1 s^-1, and K_m = 500 nM (42)(43)(44).

Setting The Equations of System 2

$$\frac{dTarget_{initial}}{dt} = -(K_1 \times Es \times Target_{initial})$$

The rate of change of intact target decreases proportionally to two factors: the amount of active enzyme (Es) and the remaining target. The negative sign indicates that the target concentration decreases over time. The parameter K₁ determines the speed of this loss—higher K₁ values correspond to faster depletion. Consequently, doubling either Es or the target concentration doubles the rate at which the target is consumed, while if either Es or the target is zero, the target concentration remains unchanged.

$$\frac{dI}{dt} = \frac{K_2 \times Target_{initial} \times Es \times H0}{K_m + Target_{initial}}$$

This Michaelis–Menten equation describes how the initiator is produced more rapidly when there is more target, more active enzyme, and more reporter. It also captures the phenomenon of target saturation: at low target concentrations, the production rate increases roughly proportionally to the target, whereas at high target concentrations, the rate levels off at a maximum determined by the available enzyme and reporter. There is a particular target concentration, known as the half-maximal point, at which the production rate reaches exactly half of this maximum.

$$\frac{dH0}{dt} = -\frac{dI}{dt}$$

The reporter hairpin (H0) and the initiator (I) are linked in a one-to-one relationship: each time a hairpin is cleaved, exactly one initiator molecule is produced. Consequently, the rate of initiator formation is equal to the rate of hairpin consumption.

$$\frac{dTarget_{after}}{dt} = \frac{K_3 \times Target_{initial} \times Es}{K_m + Target_{initial}}$$

This Michaelis–Menten equation describes how the amount of cleaved target increases at a rate that depends on the concentrations of both intact target and active enzyme. The dependence on target saturates: at low target concentrations, the cleavage rate rises roughly in proportion to the target, while at high target concentrations, the rate approaches a maximum determined by the enzyme concentration. The target concentration at which the rate reaches half of this maximum is defined by the Michaelis constant K_m.

System 2 Results

Results of system 2

The simulation plot shows a reporter-limited, front-loaded response. The Hairpin-0 reporter (green) is rapidly consumed, while the initiator (orange) rises sharply and plateaus near the reporter's initial concentration, reflecting efficient signal formation. In contrast, the intact target pool (blue) declines gradually over the entire experiment, and the cleaved target product (red) remains near baseline, indicating that target (cis) cleavage is comparatively slow and does not dominate the system dynamics. Overall, the system acts as a fast trigger that converts reporter molecules into signal early, then effectively stalls once the reporter is exhausted.

Across the four panels, the system behaves as a probe-limited amplifier. The target is only weakly converted to cleaved product, while the probe transitions almost completely from the inactive to the active state. The top-left panel shows the uncut target decaying gradually over 100 s, with the cleaved target remaining near zero, confirming that cis cleavage is slow and does not dominate the dynamics. In contrast, the bottom-left panel shows the inactive probe collapsing and the active probe rising sharply to 100 nM within the first minute, producing a strong, early signal. The bottom-right panel's flat green line (constant average probe concentration) demonstrates that the probe simply redistributes between inactive and active forms rather than being consumed.

The figure illustrates a conserved probe pool that transitions almost entirely to the active state and remains there. Within the first few tens of seconds, the inactive probe collapses toward zero while the active probe rises to nearly the full probe load (100 nM). The flat average probe concentration confirms mass conservation and indicates a simple two-state redistribution. After this rapid transition, the curves remain unchanged out to 1000 s, demonstrating effectively irreversible activation on the plotted timescale and a saturated, persistent signal.

The figure shows that the target is rapidly and effectively depleted, with minimal accumulation of cleaved product. The uncut target decays monotonically to near zero within a few hundred seconds, and the average target concentration decreases accordingly. In contrast, the cleaved target rises only slightly and then plateaus near baseline. This kinetic pattern indicates that the dominant pathway eliminates the target without generating a substantial pool of cleaved species. Functionally, the target acts as a transient activator that is quickly lost through a largely nonproductive route.

The Hairpin-0 probe (green) is rapidly depleted, while the initiator (orange) rises nearly to the probe's initial concentration and then plateaus. In contrast, the target pool (blue) declines only gradually over the 0–100 s window, and the cleaved-target trace (red) remains near baseline, indicating that cis cleavage is comparatively slow and does not dominate the system dynamics. Overall, the reaction behaves as a fast catalytic trigger, converting probe into signal early and then saturating. Consequently, the meaningful readout is achieved within the first tens of seconds, with little additional information gained from longer incubation times.

System 2 Code

For reference, check system 1. All the code the team wrote is available in our GitLab repository in the Collateral Cleavage and Activation directory.

Kinetics and Activation

The program described in this section uses MATLAB SimBiology to simulate a simple enzyme–reporter reaction in a virtual environment and generate the corresponding kinetic curves. We start with a defined amount of cutting enzyme (Cas13a + guide RNA) and reporter strand. The simulation allows these molecules to interact probabilistically—sometimes binding, sometimes catalyzing cleavage—while tracking signal accumulation over time. The program can run simulations with different initial concentrations and reaction rates (association, cleavage, or signal decay), measure initial velocities, and verify that the system's behavior matches the expected enzyme-kinetics profile.

Explaining The Parameters

First, we review the parameters used in our program. The model is fully based on Michaelis–Menten equations, and the variables are named to clearly reflect their biochemical roles. The parameters are analyzed and defined as follows:

E: active Cas13a + guide (crRNA) complex (the enzyme)
H0: Hairpin 0
EH0: enzyme–reporter complex
I: Initiator
TUC: miRNA/target proxy
k₁: Association rate E+H0
k₂: Disassociation rate E+H0
k₃: Catalysis EH0=E+I
r_l: Decay rate of the initiator over time

Setting The Parameters

In this subsection, we present the parameter values used in this program (43)(45)(46)(47) (48):

E (initial) = 20 nmol
H0 (initial) = [10 25 50 100 200 500 1000 2000] nmol (sweep)
EH0 (initial) = 0 nmol
I (initial) = 0 nmol
TUC (initial) = 50 nmol
k₁ = 10/60 1/(nM·min)
k₂ = 0.1/60 1/min
k₃ = 1000/60 1/min
r_I = 0.5 1/min

Setting The Equations

As mentioned earlier, the equations used are standard Michaelis–Menten kinetics. For clarity, we provide them here:

v₁ = k₁ × E × H0
v₂ = k₂ × EH0
v₃ = k₃ × EH0
v₄ = r_I × I
dE/dt = -v₁ + v₂ + v₃
dH0/dt = -v₁ + v₂
dEH0/dt = v₁ - v₂ - v₃
dI/dt = v₃ - v₄
dTUC/dt = 0
K_m = (k₂ + k₃)/k₁
V_max = k₃ × E₀
v₀(H0) = (V_max × H0)/(K_m + H0)

Results

Each curve shows the initiator signal (I) rising, peaking, and then declining. Early in the reaction, the enzyme cleaves reporters, causing I to increase. With a higher starting reporter concentration, the enzyme remains active longer, resulting in a taller peak that occurs later. With fewer reporters, the peak is smaller and occurs earlier. As reporters are consumed, production slows, while I continues to decay, causing the signal to drop toward baseline. Eventually, all traces decline because decay dominates once reporters are exhausted, and no signal can exceed the initial reporter amount, since each reporter produces at most one initiator.

Initial Reaction Rate vs Reporter Concentration

This plot shows the initial reaction rate versus starting reporter concentration. Black dots represent the rates extracted from the first moments of each simulation, while the red curve depicts the expected Michaelis–Menten rate based on the reaction mechanism. As the starting reporter increases, the initial rate rises and then plateaus, reflecting the enzyme becoming the limiting factor. The black dots lie below the red curve because the slope was measured immediately at t = 0, when no enzyme–reporter complexes (EH0) had yet formed. In contrast, the Michaelis–Menten curve assumes that steady state is established instantaneously.

This figure explores the effect of changing the cleavage rate k₃, which governs how quickly the enzyme cuts the reporter. As k₃ increases, the initiator signal (I) rises faster and reaches a higher peak, then declines sooner because the enzyme rapidly consumes the reporter and decay removes the product once production slows. With a smaller k₃, production is slower, so I rises gradually, reaches a lower peak, and lingers longer, forming a broad plateau where production and decay nearly balance before eventually fading.

This plot illustrates the effect of varying the decay rate of the product (r_I). When there is no decay (r_I = 0), the initiator signal (I) rises and plateaus at the total initial reporter concentration. Introducing decay (r_I > 0) causes each curve to rise while the enzyme produces product, reach a peak when production equals decay, and then decline as decay dominates once the reporter is mostly consumed. Higher r_I values result in a lower, earlier peak and a faster return toward baseline. None of the curves exceed the initial reporter concentration, since each reporter produces at most one initiator.

Catalytic Hairpin Assembly

Catalytic hairpin assembly (CHA) serves as the core signal amplification mechanism in our biological system. This method utilizes two partially complementary DNA hairpins (H1 and H2) activated by a single-stranded oligonucleotide released from hairpin H0 following CRISPR/Cas13a collateral cleavage. The subsequent analysis examines the chemical properties of these hairpin probes and their molecular interactions.

Note: Since most pieces of software can accept either DNA or RNA sequences but not sequences that contain both uracil and thymine bases, in the following analysis, all uracil bases have been replaced by thymine bases.

Choice of Hairpins

Our mechanism consists of three kinds of strands that fold into hairpins. Their names and their sequences (from 5' to 3') are presented below.

H0: CACCTAGCCCGCACACUUUUUGCGGGCTAGGTG
H1: AAAGTGTGCGGGGCTAGGTGACATTAACTACACCTAGCCCC
H2: GTAGGTGTAGTTAATGTCACCTAGCCCACATTAACTA

All hairpins' secondary structure must initially have an exposed/unpaired region, a stem with paired nucleotides and a loop. The choice of the specific nucleotides and their respective positions in each strand were hand-picked by our team to achieve certain chemical properties. The choice was, therefore, based on the available literature and was done through trial and error (link to design - Section: Hairpin probe design).

The main tools that were utilized to model the behavior of each individual hairpin were NUPACK (49) and OligoAnalyzer (50). What follows is the analysis of all hairpins individually. In that analysis, we aimed to simulate subsequent experiment conditions of the CHA protocol. Therefore, we use 50mM of sodium and 5mM of magnesium. We also use 0.05mM of DNTPs. The temperature is 37°C and the rest of the model's parameters related to the computation are the defaults. We should also note that the initial concentration of H1 and H2 is 1μM and for H0 is 100nM.

DNA Hairpin 0 (H0)

DNA Hairpin 0 (H0) releases a single strand following the collateral cleavage activity of CRISPR/Cas13a. Upon providing the sequence to NUPACK and instructing it to show the Minimum Free Energy (MFE) structure, the model returns the strand in dot-parenthesis notation. The result is:

(((((((((((((.......))))))))))))))

This notation represents the strand from the 5' to the 3' end. Each dot represents an unpaired nucleotide, while an opening parenthesis and the corresponding closing one represent one pair of nucleotides. The strand can have many different structures, but the MFE structure is the most stable, having the lowest energy. Visually, the 36 nucleotides that make up the strand are bound in the way shown below.

Hairpin H0

It should be noted that the original sequence contains uracil (U) bases instead of thymine (T) bases in the loop. However, since both NUPACK and OligoAnalyzer cannot predict the secondary structure of strands containing a mixture of DNA and RNA bases, the uracil bases in the loop were consequently replaced by thymine bases for the purposes of structural prediction.

NUPACK and OligoAnalyzer compute the following properties:

ΔG^°: -13.78 kcal/mol
Equilibrium Probability: 0.487
T_m: 85.2 °C
GC content: 60.6 %

ΔG^° is called Gibbs free energy change and is a thermodynamic measure of whether a reaction is spontaneous or requires energy input. A reaction with a negative ΔG is thermodynamically favorable. In other words, the forward reaction will dominate the reverse one. For nucleic acid hairpins, the more negative the ΔG°, the more stable the stem–loop structure is:

If ΔG° is only slightly negative, the hairpin opens easily, leading to "leakage" (premature exposure of the initiator strand and unwanted activation of the CHA circuit even in the absence of biomarkers).
If ΔG° is strongly negative, the hairpin remains folded under physiological conditions, minimizing background leakage.

For H0, we intentionally chose a hairpin with a relatively negative ΔG° (−13.78 kcal·mol⁻¹) to ensure that the structure remains closed and does not spontaneously release single-stranded initiators. This stability is essential to maintain specificity and prevent false activation. At the same time, the loop of H0 contains a stretch of uracils that makes it an efficient target for Cas13a collateral cleavage. Once Cas13a is activated by the presence of its target, cleavage within this U-rich loop destabilizes the hairpin and triggers its opening, thereby releasing the single-stranded initiator needed to start CHA.

Equilibrium Probability represents the fraction of molecules that will adopt the secondary structure computed above once the system has achieved chemical equilibrium. This equilibrium point, during a reaction, is defined by the condition where the concentrations of both the products and reactants remain constant. For the strand analyzed, approximately 48% of the molecules are predicted to possess the shown structure at this equilibrium.

T_m is called melting temperature and represents the temperature at which 50% of DNA double helices dissociate into single strands. In the above case, at 85.2 °C half the strands will have hairpin structure and half of them will be linear strands. A higher melting temperature indicates a stable hairpin with slow kinetics.

GC content refers to the percentage of nucleotides in a DNA sequence that are guanine (G) or cytosine (C) bases. A higher GC content indicates a more stable hairpin, meaning that it does not open easily, because these bases form a higher number of hydrogen bonds when in a pair.

DNA Hairpin Probe 1 (H1)

The structure in dot-parenthesis notation computed by NUPACK for DNA Hairpin Probe 1 is the following:

.........(((((((((((..........))))))))))))

This follows the classic three domains structure. Visually, it looks like the figure below.

Hairpin H1

The bottom unpaired "tail" of nucleotides known as the toehold, is a free overhang used to bind onto the initiator. After the binding happens, the stem (middle paired area) opens up and the now open strand can bind onto the invading H2 hairpin.

The properties computed by NUPACK and OligoAnalyzer for the above structure are:

ΔG^°: -11.37 kcal/mol
Equilibrium Probability: 0.738
T_m = 80.1 °C
GC content = 53.7 %

This is a hairpin structure that is adopted by about 70% of the H1 strands and is moderately stable. It does not open on its own at body temperature and is carefully designed to react in the presence of the initiator strand.

DNA Hairpin H2

For the DNA Hairpin probe 2, the dot-parenthesis notation that NUPACK computes is the following:

........(((((((((..........)))))))))..

Just like H1, H2 clearly has a toehold for strand displacement initiation, a stem region and a loop.

Hairpin H2

The properties computed by NUPACK and OligoAnalyzer are:

ΔG^°: -4.76 kcal/mol
Equilibrium Probability: 0.478
T_m = 60.2°C
GC content = 40.5 %

This hairpin has lower free energy than H1 and about 46% of the H2 strands take the above structure. However, it is stable enough and just like H1, it does not open on its own at 37°C. It is carefully designed to react in the presence of H1.

Strand Interactions

A coarse-grained way of interpreting the CHA mechanism is by describing the two main interactions between the DNA hairpins. In reality, a cascade of more reactions take place but we only focused on the two that form new complexes.

The first one is the hybridization that takes place between the initiator and the hairpin H1, followed by H1's toehold mediated strand displacement. Written in chemical reaction notation, it looks like this:

Initiator + H1 ↔ Initiator:H1

Note: strand1:strand2 represents a duplex between those two strands.

The second one is the toehold mediated strand displacement that takes place after the H2 harpin invades the above duplex, hybridizes with H1 and displaces the Initiator. It can be expressed as:

Initiator:H1 + H2 ↔ H1:H2 + Initiator

The two reactions described above constitute a simple chemical reaction network, the kinetics of which can be described by a system of ordinary differential equations (ODEs). Solving these equations can show us how the concentrations of the reactants and the products change over time. It should be noted that such a chemical reaction network is usually composed of more intermediate steps, like hairpin opening and further hybridization. A more fine-grained model could show us the reactions of all different structures of each complex, but since this analysis is just a demonstration of how the main strands interact with each other, the two reactions described above are sufficient.

Mathematical Model

We will assume that the reactions can proceed in both ways. In other words, the reactants can form products, and the products can dissociate back into reactants.

In the following analysis we will use the symbols presented below.

V₁: rate of the first reaction
V₂: rate of the second reaction
K_1f: forward rate of the first reaction
K_1r: reverse rate of the first reaction
K_2f: forward rate of the second reaction
K_2r: reverse rate of the second reaction
[I]: concentration of Initiator
[H1]: concentration of H1
[H2]: concentration of H2
[I:H1]: concentration of Initiator:H1 complex
[H1:H2]: concentration of H1:H2 complex

The two reaction rates can be defined as:

V₁ = K_1f[I][H1] - K_1r[I:H1]

V₂ = K_2f[I:H1][H2] - K_2r[H1:H2][I]

They can be seen as how fast the forward reaction happens, minus how fast the reverse reaction happens. A positive reaction rate indicates that the products are being formed faster than they dissociate. Intuitively, the reaction rate is proportional to the forward rate, since a higher forward rate causes a fast formation of products. It is also proportional to the concentrations of the reactants, since a higher concentration leads to more molecular collisions between the substrates, therefore, to a faster product formation.

Using the above rate constants, we can define our ODE system as follows.

d[I]/dt = -V₁ + V₂

d[H1]/dt = -V₁

d[H2]/dt = -V₂

d[I:H1]/dt = V₁ - V₂

d[H1:H2]/dt = V₂

To get an intuitive understanding of why the equations are defined this way, we will use an example.

Since the first equation depletes the initiator and the second one produces it, the rate of change of its concentration will be minus the rate of its depletion, plus the rate of its production. Therefore, the formula is:

d[I]/dt = -V₁ + V₂

The rest of the equations follow the same logic.

To solve this ODE system, we need the initial concentrations, which we have, and the rate constants, which we will try to approximate.

A common way of approximating the rate constants is by using computational tools like the python packages KinDA (51) and Multistrand (52). They require both the sequences and the exact structures of the formed complexes as input. The structures were calculated using NUPACK. However, NUPACK cannot account for multistep reactions like ours and the structures we got as output were the MFE ones, thermodynamically favorable but unlikely to happen, given the initial hairpin structures. Since NUPACK's structures are not representative of the actual ones, Multistrand's and KinDA's analysis could not simulate the reactions and we turned to the available literature to find good estimates on the rates.

Reaction 1: Initiator-H1 Hybridization

To calculate the forward rate of the first reaction we applied the nucleation-based hybridization model developed by Hertel et al (53). The authors experimentally measured hybridization rates for 43 carefully designed DNA sequences and developed a model based on the principle that hybridization proceeds through rate-limiting nucleation events. Nucleation is the initial formation of a few base pairs between two DNA strands that serves as a stable anchor point from which complete hybridization can rapidly proceed. Their model assumes that short stretches of complementary base pairs (typically 3 bases) must form first, followed by rapid "zippering" of the remaining sequence. Their model achieved strong correlation with experimental data (ρ = 0.73).

The model is this:

k_a = κ(L-n+1)^-2 ∑_i ∑_j 1/(1+e^{(γ+ΔG°_i,j/RT)})

The essence of the formula is that the rate of hybridization equals the average probability that each nucleation attempt succeeds. The κ rate is the intrinsic rate at which DNA strands collide and attempt to form nucleation sites. (L – n + 1) is the number of possible nucleation sites. The double sum represents the different probabilities for all nucleation attempts. So the sum of all probabilities divided by the number of nucleation attempts gives us the average probability that each nucleation succeeds. This number, scaled by κ, approximates the rate of the hybridization.

In our reaction, the areas first to hybridize are those marked below.

Initiator + H1

CACCTAGCCCGCACACTTT + AAAGTGTGCGGGGGCTAGGTGACATTAACTACACCTAGCCCC

The first thing we do is compute the length normalization factor:

(L-n+1)^-2

In the model presented above, L represents the length of H1's initially exposed region (marked area) and is equal to 9.

L = 9

n stands for the nucleation length, which is the number of bases initially required to form pairs, to start the hybridization process. The optimal n, according to the paper, is 3.

n = 3

Therefore:

(L-n+1)^-2 = (9-3+1)^-2 = 1/49

In other words, there are 7 different locations where each triplet can start the hybridization process, for 7 different triplets. Therefore, the initiation of the reaction can happen in 7×7 = 49 different ways, and the division by that number is the normalization process.

ΔG°_{I, j} represents the free energy of each 3-base nucleation site. (i and j represent the position indices for each nucleation site) Those values were computed with NUPACK, having as parameters 50mM of sodium, 5mM of magnesium and temperature equal to 37°C.

T represents the temperature in Kelvin. We are using 37°C which is equal to 310.15 K.

T = 310.15 K

Multiplying with the gas constant gives:

RT = 8.314 × 310.15 = 2,578 J/mol

Due to missing parameters in the original Hertel publication, we estimated κ by requiring our sequence to fall within their experimental rate range for similar sequences. Sequences with similar lengths and bases had a predicted rate of 3×10⁶ M^-1s^-1. Using this rate, we can approximate κ by solving the initial equation for κ.

κ ≈ k_estimated/((L-n+1)^-2 ∑_i ∑_j 1/(1+e^{(γ+ΔG°_i,j/RT)}))

Finally, γ is a fitted parameter that coincides with the value of −ΔG°_I,j/RT for which the probability of continuing the hybridization is 50%.

Since the Hertel paper didn't publish their fitted γ parameter, we estimated γ = 4.1 by setting the 50% nucleation success threshold at -2.5 kcal/mol, which represents the middle of our calculated nucleation energy range (-1.74 to -3.24 kcal/mol).

Average energy: ΔG₀ ≈ -2.5 kcal/mol

For P = 0.5:

1/(1+e^{(γ+ΔG°/RT)}) = 50%

Solve for γ:

γ = 4.1

Using the parameters approximated above, we set up a python script using NUPACK's python module. It finds all the nucleation sites, calculates the probabilities and finally computes the hybridization rate. The code is available in our GitLab repository, in a directory called "Catalytic Hairpin Assembly". The python file is called forward_rate_calculation.py and running it gives:

Calculating nucleation energies for all 49 combinations... Strand 1: AAAGTGTGC Strand 2: TTTCACACG Parameters: γ=4.1, T=37°C, n=3, L=9 ------------------------------------------------------------ (1,1): AAA:TTT | ΔG=-1.98 kcal/mol | Structure=(((+))) | P=0.292 | MFE (1,2): AAA:TTC | ΔG=-2.31 kcal/mol | Structure=.((+)). | P=0.415 | MFE ... (7,6): TGC:CAC | ΔG=-2.72 kcal/mol | Structure=.(.+..) | P=0.577 | MFE (7,7): TGC:ACG | ΔG=-3.32 kcal/mol | Structure=.(.+.). | P=0.783 | MFE ============================================================ HERTEL MODEL RESULTS ============================================================ Total probability sum: 23.585 Length normalization factor: 0.020408 Product (before κ): 0.481329 Estimated κ: 6.23e+06 M^-1 s^-1 Predicted rate constant: 3.00e+06 M^-1 s^-1 Calculation complete. 49 nucleation sites evaluated. Contributing sites: 41

The rate calculated is k = 3×10⁶ M^-1s^-1 which falls within the experimental range and is likely a sufficient estimate.

In the following subsection we will try to estimate the forward rate of the second strand displacement reaction.

Reaction 2: H2 Invasion and Initiator Displacement

To calculate the forward rate of this reaction we used the kinetic model developed by Zhang and Winfree (2009) (54). They systematically measured the kinetics of 85 different strand displacement reactions using fluorescence spectroscopy, varying toehold lengths (0-15 nucleotides) and sequence compositions. Their model decomposes strand displacement into three elementary steps: initial toehold binding, branch migration across the displacement domain, and final product release. By fitting only two key parameters (the hybridization rate constant (k_f) and branch migration rate (k_b)) to experimental data, they achieved quantitative predictions within one order of magnitude for all tested sequences, with 71 of 85 reactions predicted within a factor of two.

Technically, the rate of the 3-step reaction, under the right conditions, can be captured by a single number, which depends on the invader strand's toehold length and its GC content.

In our reaction, the areas first to hybridize are the ones marked below.

Initiator:H1 + H2

CACCTAGCCCGCACACTTT:AAAGTGTGCGGGGGCTAGGTGACATTAACTACACCTAGCCCC + GTAGGTGTAGTTAATGTCACCTAGCCCACATTAACTA

The binding area of the toehold of H2 has 7 bases and has a GC content of approximately 43%.

Figure 3 from Zhang, D. Y., & Winfree, E. (2009)

In the above figure taken from Zhang, D. Y., & Winfree, E. (2009), the middle graph (B), shows the correlation between toehold length and forward rate. The middle line (black) represents mixed sequence toeholds and shows that lengths greater than 6 bases fall inside a saturation area. In this area, further increasing the lengths, yields diminishing results.

Using the data from the above figure, we can estimate the forward rate of our strand displacement reaction to be approximately 3×10⁶M^-1s^-1.

It should be mentioned that the experiments were conducted in different conditions like lower temperature (25°C) and lower strand concentrations. It is explicitly stated that the above model is not reliable in concentrations above 33nM. There were limited reliable sources on reaction rates for our exact conditions, and therefore we used the model described above to get an inaccurate estimate on the reaction. Since it is just used as a demonstration of the strand interactions, we deem it sufficient.

Reverse Rates

The rates discussed above are the forward rates of the reactions of our system. It is possible to calculate the reverse rates, through the forward ones, via thermodynamics. Dividing the forward rate by the reverse rate, gives us the equilibrium rate.

K_eq = K_f/K_r

If we know the forward and equilibrium rates, we can calculate the reverse rate using the above formula.

The K_eq is defined as:

K_eq = e^(-ΔG°/RT)

where:

ΔG° = ΔG°_products - ΔG°_reactants

The reverse rates for our reactions are calculated using the above formulas in our python script reverse_rates_calculation.py. Running this script, gives us:

====================================================================== REVERSE RATE CALCULATION SYSTEM ====================================================================== Temperature: 37.0°C Na+ concentration: 50.0 mM Mg++ concentration: 5.0 mM H1/H2 concentration: 1.0 μM Initiator concentration: 100.0 nM Forward rate constants: 3.0e+06 M^-1 s^-1 (both) Sequences: initiator: CACCTAGCCCGCACACTTT h1: AAAGTGTGCGGGGCTAGGTGACATTAACTACACCTAGCCCC h2: GTAGGTGTAGTTAATGTCACCTAGCCCACATTAACTA Reactions: 1. initiator + h1 ⇌ initiator:h1 2. initiator:h1 + h2 ⇌ h1:h2 + initiator Starting reverse rate calculation analysis... === NUPACK Thermodynamic Analysis === Setting up NUPACK complexes... Calculating complex free energies... Individual complex free energies: ΔG°(initiator): -0.294 kcal/mol ΔG°(h1): -11.556 kcal/mol ΔG°(h2): -5.219 kcal/mol ΔG°(initiator:h1): -25.118 kcal/mol ΔG°(h1:h2): -32.804 kcal/mol === Reaction Thermodynamics === Reaction free energies (ΔG°_products - ΔG°_reactants): Reaction 1: initiator + h1 → initiator:h1 ΔG°_rxn1 = -25.118 - (-0.294 + -11.556) ΔG°_rxn1 = -13.268 kcal/mol Reaction 2: initiator:h1 + h2 → h1:h2 + initiator ΔG°_rxn2 = (-32.804 + -0.294) - (-25.118 + -5.219) ΔG°_rxn2 = -2.761 kcal/mol Equilibrium constants at 37.0°C: K_eq1 = exp(-ΔG°_rxn1/RT) = exp(--13.268/0.616) = 2.24e+09 K_eq2 = exp(-ΔG°_rxn2/RT) = exp(--2.761/0.616) = 8.82e+01 === Reverse Rate Calculation === Rate constant calculations: Reaction 1: k1_forward = 3.00e+06 M^-1 s^-1 (specified) k1_reverse = k1_forward / K_eq1 = 3.00e+06 / 2.24e+09 k1_reverse = 1.34e-03 s^-1 Reaction 2: k2_forward = 3.00e+06 M^-1 s^-1 (specified) k2_reverse = k2_forward / K_eq2 = 3.00e+06 / 8.82e+01 k2_reverse = 3.40e+04 M^-1 s^-1 Rate constant verification: K_eq1 = k1_f/k1_r = 2.24e+09 ✓ K_eq2 = k2_f/k2_r = 8.82e+01 ✓ === Reaction Analysis === Thermodynamic analysis: Reaction 1: ΔG° = -13.27 kcal/mol → Highly favorable Reaction 2: ΔG° = -2.76 kcal/mol → Highly favorable ====================================================================== FINAL RESULTS SUMMARY ====================================================================== Reaction free energies: ΔG°_rxn1 = -13.27 kcal/mol ΔG°_rxn2 = -2.76 kcal/mol Equilibrium constants: K_eq1 = 2.24e+09 K_eq2 = 8.82e+01 Rate constants: k1_forward = 3.00e+06 M^-1 s^-1 k1_reverse = 1.34e-03 s^-1 k2_forward = 3.00e+06 M^-1 s^-1 k2_reverse = 3.40e+04 M^-1 s^-1

Now that we have the reaction rates, we can move on to solve our ODE system.

Solving the ODE system

The final step in our analysis is to plot the concentrations of the different strands and complexes over time, so we can get an understanding of how the CHA mechanism is supposed to work. In order to do that, we set up a python script called strand_dispalcement_ode_model.py which solves the system of equations described in the "Mathematical Model" subsection, by hardcoding the initial concentrations and the rates calculated above. Running the script produces the following figure:

Concentrations Over Time

This is a clear demonstration of the function of a CHA mechanism. The initiator's concentration drops in the beginning as it hybridizes with H1, and then goes back up, since it is being released by the second strand displacement step. The concentrations of both H1 and H2 drop too, since they form the H1-H2 complex. The complex just mentioned has an increasing concentration over time, since it is the main product of the reaction. Finally, the Initiator-H1 complex is initially being formed but after some time dissociates during the strand displacement step.

This is a key point of CHA systems. The last step releases the initiator so it can continue to hybridize with other hairpins, triggering the reaction repeatedly, causing signal amplification. This is precisely what we are trying to achieve.

We should note that this is an ideal CHA system and not a realistic one, as a series of assumptions have been made. For example, we assumed that the forward rate constants for both reactions are 3×10⁶ M^-1s^-1, which was not measured experimentally. Another assumption is that all strands and complexes always take the desired secondary structure. The above analysis, therefore, serves as an ideal demonstration of the way the CHA system is supposed to work.

References

He, R. R., Yue, G. L., Dong, M. L., Wang, J. Q., & Cheng, C. (2024). Sepsis biomarkers: Advancements and clinical applications—A narrative review. International Journal of Molecular Sciences, 25(16), 9010. https://doi.org/10.3390/ijms25169010
Lazaros, K., Adam, S., Krokidis, M. G., Exarchos, T., Vlamos, P., & Vrahatis, A. G. (2025). Non-invasive biomarkers in the era of big data and machine learning. Sensors, 25(5), 1396. https://doi.org/10.3390/s25051396
Duncan, C. F., Youngstein, T., Kirrane, M. D., & Lonsdale, D. O. (2021). Diagnostic challenges in sepsis. Current Infectious Disease Reports, 23(12), 22. https://doi.org/10.1007/s11908-021-00765-y
Benz, F., Roy, S., Trautwein, C., Roderburg, C., & Luedde, T. (2016). Circulating microRNAs as biomarkers for sepsis. International Journal of Molecular Sciences, 17(1), 78. https://doi.org/10.3390/ijms17010078
Dumache, R., Rogobete, A. F., Bedreag, O. H., Sarandan, M., Cradigati, A. C., Papurica, M., Dumbuleu, C. M., Nartita, R., & Sandesc, D. (2015). Use of miRNAs as biomarkers in sepsis. Analytical Cellular Pathology, 2015, 186716. https://doi.org/10.1155/2015/186716
Bindayna, K. (2024). MicroRNA as sepsis biomarkers: A comprehensive review. International Journal of Molecular Sciences, 25(12), 6476. https://doi.org/10.3390/ijms25126476
Li, Y., Qiu, C., Tu, J., Geng, B., Yang, J., Jiang, T., & Cui, Q. (2014). HMDD v2.0: A database for experimentally supported human microRNA and disease associations. Nucleic Acids Research, 42(Database issue), D1070–D1074. https://doi.org/10.1093/nar/gkt1023
Kraev, E., Koseoglu, B., Traverso, L., & Topiwalla, M. (2024). shap-select: Lightweight feature selection using SHAP values and regression. GitHub. https://github.com/transferwise/shap-select
Li, B., Bai, L., Shen, P., Sun, Y., Chen, Z., & Wen, Y. (2017). Identification of differentially expressed microRNAs in knee anterior cruciate ligament tissues surgically removed from patients with osteoarthritis. International Journal of Molecular Medicine, 40(4), 1105–1113. https://doi.org/10.3892/ijmm.2017.3086
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., Deng, D., & Lindauer, M. (n.d.). Hyperparameter optimization: Foundations, algorithms, best practices and open challenges [Unpublished manuscript].
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS 2017). Long Beach, CA, USA.
Lorenz, R., Bernhart, S. H., Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA package 2.0. Algorithms for Molecular Biology, 6, 26. https://doi.org/10.1186/1748-7188-6-26
Alberts, B., Johnson, A., Lewis, J., Morgan, D., Raff, M., Roberts, K., & Walter, P. (2014). Molecular biology of the cell (6th ed.). Garland Science. https://www.ncbi.nlm.nih.gov/books/NBK26887/
Bernhart, S. H., Hofacker, I. L., Will, S., Gruber, A. R., & Stadler, P. F. (2006). Partition function and base pairing probabilities of RNA heterodimers. Algorithms for Molecular Biology, 1, 3. https://doi.org/10.1186/1748-7188-1-3
Abudayyeh, O. O., Gootenberg, J. S., Essletzbichler, P., Han, S., Joung, J., Belanto, J. J., Verdine, V., Tzeng, I. A., Li, J., Donghia, N., Carter, A., Paten, B., & Zhang, F. (2017). RNA targeting with CRISPR–Cas13a. Science, 358(6366), 336–339. https://doi.org/10.1126/science.aaq0180
East-Seletsky, A., O'Connell, M. R., Knight, S. C., Burstein, D., Cate, J. H. D., Tjian, R., & Doudna, J. A. (2016). Two distinct RNase activities of CRISPR-C2c2 (Cas13a) enable guide-RNA processing and RNA detection. Nature, 538(7624), 270–273. https://doi.org/10.1038/nature19802
The UniProt Consortium. (2023). UniProt: The Universal Protein Resource in 2023. Nucleic Acids Research, 51(D1), D523–D531. https://doi.org/10.1093/nar/gkac1052
AlphaFold DB. (2021). Interpreting confidence: pLDDT and thresholds [FAQ]. https://alphafold.ebi.ac.uk/
EMBL-EBI. (2021). pLDDT: Understanding local confidence [Training resource]. https://www.ebi.ac.uk/training
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2
Wang, J., Mao, K., Zhao, Y., Zeng, C., Xiang, J., Zhang, Y., & Xiao, Y. (2017). Optimization of RNA 3D structure prediction using evolutionary restraints of nucleotide–nucleotide interactions from direct coupling analysis. Nucleic Acids Research, 45(11), 6299–6309. https://doi.org/10.1093/nar/gkx386
Yan, Y., Zhang, D., & Zou, X. (2017). HDOCK: A web server for protein–protein and protein–DNA/RNA docking based on a hybrid strategy. Nucleic Acids Research, 45(W1), W365–W373. https://doi.org/10.1093/nar/gkx367
Li, H., Yan, Y., Zhang, D., & Zou, X. (2022). HDOCK update for modeling protein–RNA/DNA complex structures. Protein Science, 31(1), 150–162. https://doi.org/10.1002/pro.4212
Zhang, C., Konermann, S., Brideau, N. J., Lotfy, P., Wu, X., Novick, S. J., Strutzenberg, T., Griffin, P. R., Hsu, P. D., & Lyumkis, D. (2018). Structural basis for the RNA-guided ribonuclease activity of CRISPR–Cas13d. Cell, 175(1), 212–223.e17. https://doi.org/10.1016/j.cell.2018.09.001
Yang, Y., Yi, W., Gong, F., Tan, Z., Yang, Y., Shan, X., Xie, C., Ji, X., Zheng, Z., & He, Z. (2023). CRISPR/Cas13a trans-cleavage-triggered catalytic hairpin assembly assay for specific and ultrasensitive SARS-CoV-2 RNA detection. Analytical Chemistry, 95(2), 1343–1349. https://doi.org/10.1021/acs.analchem.2c04306
Lysne, D. P., Stewart, M. H., Susumu, K., Leski, T. A., Stenger, D. A., Medintz, I. L., Díaz, S. A., & Green, C. M. (2025). Quantum dot molecular beacons achieve sub-10 pM CRISPR-Cas detection in field-ready assays. Scientific Reports, 15, 27950. https://doi.org/10.1038/s41598-025-09434-9
Miao, J., Zuo, L., He, D., Fang, Z., Berthet, N., Yu, C., & Wong, G. (2023). Rapid detection of Nipah virus using the one-pot RPA-CRISPR/Cas13a assay. Virus Research, 332, 199130. https://doi.org/10.1016/j.virusres.2023.199130
Gu, X., Zhang, J., Liang, J., Liu, X., He, X., Jin, X., Yan, C., Wang, L., & Song, C. (2024). CRISPR/Cas13a trans-cleavage and catalytic hairpin assembly cascaded signal amplification powered SERS aptasensor for ultrasensitive detection of gastric cancer-derived exosomes. Analytical Chemistry, 96(47), 18681–18689. https://doi.org/10.1021/acs.analchem.4c03063
Sheng, Y., Zhang, T., Zhang, S., Johnston, M., Zheng, X., Shan, Y., Liu, T., Huang, Z., Qian, F., Xie, Z., Ai, Y., Zhong, H., Kuang, T., Dincer, C., Urban, G. A., & Hu, J. (2021). A CRISPR/Cas13a-powered catalytic electrochemical biosensor for successive and highly sensitive RNA diagnostics. Biosensors and Bioelectronics, 178, 113027. https://doi.org/10.1016/j.bios.2021.113027
Song, D., Wang, Q., Liu, Y., Chen, Y., Zhang, L., & Liu, J. (2024). A CRISPR/Cas13a-powered catalytic hairpin assembly evanescent wave fluorescence biosensor for target amplification-free SARS-CoV-2 detection. Sensors and Actuators B: Chemical, 405, 135296. https://doi.org/10.1016/j.snb.2024.135296
Molina Vargas, A. M. M., Li, X., Chen, H., Zhao, C., Sun, J., & Yang, H. (2023). New design strategies for ultra-specific CRISPR-Cas13a-based RNA detection with single-nucleotide mismatch sensitivity. Nucleic Acids Research, 52(2), gkad1132. https://doi.org/10.1093/nar/gkad1132
Nalefski, E. A., Patel, R. V., Truong, L., Krishnan, A., Kellner, M. J., Koob, J., … Tjian, R. (2021). Kinetic analysis of Cas12a and Cas13a RNA-guided nucleases for development of improved CRISPR-based diagnostics. iScience, 24, 102996. https://doi.org/10.1016/j.isci.2021.102996
Ramachandran, A., & Santiago, J. G. (2021). Enzyme kinetics of CRISPR molecular diagnostics. bioRxiv. https://doi.org/10.1101/2021.02.03.429513
Schreiber, G., Haran, G., & Zhou, H.-X. (2009). Fundamental aspects of protein–protein association kinetics. Chemical Reviews, 109(3), 839–860. https://doi.org/10.1021/cr800373w
Shinoda, H., Taguchi, Y., Nakagawa, R., Makino, A., Okazaki, S., Nakano, M., & Watanabe, R. (2021). Amplification-free RNA detection with CRISPR–Cas13. Communications Biology, 4, 476. https://doi.org/10.1038/s42003-021-02001-8
Santiago, J. G. (2022). Inconsistent treatments of the kinetics of CRISPR impair assessment of its diagnostic potential. QRB Discovery, 3, e7. https://doi.org/10.1017/qrd.2022.7
Zadeh, J. N., Steenberg, C. D., Bois, J. S., Wolfe, B. R., Pierce, M. B., Khan, A. R., Dirks, R. M., & Pierce, N. A. (2011). NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32(1), 170–173. https://doi.org/10.1002/jcc.21596
Integrated DNA Technologies. (2024). OligoAnalyzer tool [Web application]. https://www.idtdna.com/calc/analyzer
Schaeffer, J. M., Thachuk, C., & Winfree, E. (2015). Stochastic simulation of the kinetics of multiple interacting nucleic acid strands. In A. Phillips & P. Yin (Eds.), DNA Computing and Molecular Programming (pp. 194–211). Springer. https://doi.org/10.1007/978-3-319-21999-8_13
Badelt, S., Grun, C., Sarma, K. V., Wolfe, B. R., Shin, S. W., & Winfree, E. (2017). A domain-level DNA strand displacement reaction enumerator allowing arbitrary non-pseudoknotted secondary structures. Journal of the Royal Society Interface, 14(136), 20170385. https://doi.org/10.1098/rsif.2017.0385
Hertel, S., Spinney, R. E., Xu, S. Y., Ouldridge, T. E., Morris, R. G., & Lee, L. K. (2022). The stability and number of nucleating interactions determine DNA hybridization rates in the absence of secondary structure. Nucleic Acids Research, 50(14), 7829–7841. https://doi.org/10.1093/nar/gkac590
Zhang, D. Y., & Winfree, E. (2009). Control of DNA strand displacement kinetics using toehold exchange. Journal of the American Chemical Society, 131(47), 17303–17314. https://doi.org/10.1021/ja906987s
Desta, I. T., Porter, K. A., Xia, B., Kozakov, D., & Vajda, S. (2020). Performance and its limits in rigid-body protein docking. Current Opinion in Structural Biology, 64, 9–16. https://doi.org/10.1016/j.sbi.2020.06.008
van Zundert, G. C. P., Rodrigues, J. P. G. L. M., Trellet, M., Schmitz, C., Kastritis, P. L., Karaca, E., … Bonvin, A. M. J. J. (2016). The HADDOCK2.2 web server: User-friendly integrative modeling of biomolecular complexes. Journal of Molecular Biology, 428(4), 720–725. https://doi.org/10.1016/j.jmb.2015.09.014
Bonvin Lab. (2024). HADDOCK 2.4 manuals: Protocol stages, scoring function, and analysis pipeline [Documentation]. https://www.bonvinlab.org
BioExcel/HADDOCK Forum. (2024). Z-score interpretation in docking [Community resource]. https://www.bioexcel.eu
Liu, L., Li, X., Ma, J., Zhang, X., Wang, M., Wang, Y., & Zhang, Y. (2017). The molecular architecture for RNA-guided RNA cleavage by Cas13a. Molecular Cell, 65(2), 310–322.e3. https://doi.org/10.1016/j.molcel.2016.12.019
Zhang, C., Konermann, S., Brideau, N. J., Lotfy, P., Wu, X., Novick, S. J., Strutzenberg, T., Griffin, P. R., Hsu, P. D., & Lyumkis, D. (2018). Structural basis for the RNA-guided ribonuclease activity of CRISPR–Cas13d. Cell, 175(1), 212–223.e17. https://doi.org/10.1016/j.cell.2018.09.001
(Original item 35 in your list) Zhang, C.; Konermann, S.; Brideau, N. J.; Lotfy, P.; Wu, X.; Novick, S. J.; Strutzenberg, T.; Griffin, P. R.; Hsu, P. D.; Lyumkis, D. (2018). Structural basis for the RNA-guided ribonuclease activity of CRISPR-Cas13D. Cell. https://pubmed.ncbi.nlm.nih.gov/30241607/ (Accessed 21 Sept. 2025).
Wikipedia contributors. (n.d.). Docking (molecular). Wikipedia, The Free Encyclopedia. Retrieved [access date], from https://en.wikipedia.org/wiki/Docking_(molecular)#Scoring_function
Li, H., et al. (2022). HDOCK update for modeling protein–RNA/DNA complex structures. Protein Science. https://doi.org/10.1002/pro.4212
Advances in application of CRISPR-Cas13a system. (n.d.). [Article preprint or review – author details needed]. (If you want, I can locate full citation details for this item.)
Yang, Y., Yi, W., Gong, F., Tan, Z., Yang, Y., Shan, X., Xie, C., Ji, X., Zheng, Z., & He, Z. (2023). CRISPR/Cas13a trans-cleavage-triggered catalytic hairpin assembly assay for specific and ultrasensitive SARS-CoV-2 RNA detection. Analytical Chemistry, 95(2), 1343–1349. https://doi.org/10.1021/acs.analchem.2c04306
Zadeh, J. N., Steenberg, C. D., Bois, J. S., Wolfe, B. R., Pierce, M. B., Khan, A. R., Dirks, R. M., & Pierce, N. A. (2011). NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32(1), 170–173. https://doi.org/10.1002/jcc.21596