Our project was born from a dual motivation: a pressing global health challenge and a complex molecular engineering puzzle. Sexually Transmitted Infections (STIs) are a growing concern worldwide, and the most powerful tool in preventing their spread is frequent, accessible, and reliable testing. However, the journey to create such a test is fraught with difficulty. Engineering a robust, low-cost diagnostic requires orchestrating a complex network of biochemical reactions, where unforeseen interactions can lead to system failure. This reality, that a purely experimental approach would be slow, costly, and blind to many potential pitfalls, motivated us to adopt a model-driven engineering philosophy from the project's inception. We aimed to use computation not just to analyze our results, but to guide our every step.
Our entire modeling effort was designed to answer a set of fundamental questions:
To answer these questions, we devised a comprehensive, multi-scale strategy, approaching the problem from two directions simultaneously. Our "bottom-up" strategy focused on creating a detailed Reaction Model based on the mechanistic principles of biochemistry. By building this model in modular units, we could understand the molecular machinery of our test from first principles. Our "top-down" strategy involved developing an Epidemiological Model to understand the human-scale environment in which our test would operate. This dual strategy ensured our work was always connected, linking the performance of molecules in a test tube to the health of individuals in a community.
From Prediction to Validation and Design
The story of our modeling is a journey from uncertainty to a project-saving design. We began by tackling the "black box" problem of proprietary RPA kits. Using our model to perform a sensitivity analysis, we discovered that the system's performance was too sensitive to the concentration of SSB, providing our first critical insight.
The most crucial step was experimental validation. Our lab team's benchmark experiment yielded a result of 405 nM of amplified DNA, a value that fell squarely within the plausible range our model had predicted. This successful validation was the pivotal moment, giving us the confidence to trust our model's predictions.
That trust was essential when the integrated model made its most important finding: a "cis-cleavage deadlock", a real failure mode where the CRISPR enzyme would destroy the DNA faster than it could be amplified. The model didn't just find the problem - it gave us the solution. A temporal separation of the reactions would resolve the issue, leading us directly to propose a tangible "two-chamber" hardware design. This journey - from navigating uncertainty to experimental validation and a project-saving design -is the core of our model's success story.
Our modeling framework successfully answered the core questions we set out to investigate.
The models we have built are dynamic tools for future innovation. We plan to use them to create a quantitative diagnostic tool, allowing a smartphone to estimate viral or bacterial load from the signal speed and intensity. Ultimately, we see our work as a step toward a global pathogen surveillance network, where data from low-cost tests like ours feeds into epidemiological models to predict and prevent future outbreaks.
Our kinetic models were developed using a system of ordinary differential equations (ODEs) solved with the robust LSODA solver, implemented in Python with the SciPy and Matplotlib libraries. Our epidemiological model was developed as an agent-based model. Full details, equations, and parameter justifications can be found on the respective model pages.
All models, the complete codebase, and simulation data are provided as open-source tools. We particularly offer our Enzyme Stability and Asymmetrical RPA frameworks as well-documented, foundational starting points for any future teams tackling the challenges of diagnostic shelf-life or alternative detection methods. You can access everything in our Git repository.
The core of our project is a complex biochemical system involving multiple, interacting enzymatic reactions. To navigate this complexity and engineer a robust diagnostic, we developed a detailed mechanistic model of our entire reaction, from DNA amplification to signal generation. This "digital twin," built upon a system of Ordinary Differential Equations (ODEs), served three primary goals:
Our model was built in a modular fashion, starting with individual units for Recombinase Polymerase Amplification (RPA) and CRISPR-Cas12a, which were later integrated into a single, comprehensive system. For more information about these methods, please visit our Project Description Page.
To simulate our complex diagnostic test, we built our model from a set of distinct but interconnected modules. Each module represents a core biological process, which allowed us to develop and test them individually before combining them into a fully integrated system.
Our model describes the generation of symmetrical RPA products (amplicons) using a system of ODEs derived from eight main chemical reactions that take place during the RPA process. We assumed identical reaction rates for the forward and reverse primers. Initially, we developed a mathematical model to simulate the RPA process on its own.
The RPA process is based on the following reactions:
From these reactions, we derived a set of ODEs that describe the temporal changes in the concentrations of all involved species:
The RPA parameters were taken from experimental studies [1] at a reaction temperature of 39 °C. The referenced study reports experiments performed under continuous stirring conditions. Since our experimental setup does not involve stirring, we needed to adjust the parameters for reactions 2a, 2b, and 3, as these are diffusion-dependent. In the absence of stirring, a reduced amplification rate is expected.
To account for this difference, we applied the correction factor (Sf) described in [1], leading to the following adjusted parameters:
Because our experiments were carried out without stirring (RPM = 0), the respective reaction rate constants were scaled accordingly.
| Parameter | Definition | Value | Unit | Reference |
|---|---|---|---|---|
| \( B \) | Number of base pairs in template | 397 | -- | Experimental design |
| \( n \) | Number of R-sites on primer | 8 | -- | Model assumption* |
| \( m \) | Number of GP32 binding sites on primer | 4 | -- | Model assumption* |
| \( n_{\mathrm{p}} \) | Number of base pairs in primer | 32 | -- | Experimental design |
| \( k_{1\mathrm{f}} \) | Forward rate constant for Eq. (1) | 5×10⁸ | \( \mathrm{M^{-1}} \) | (1) |
| \( k_{1\mathrm{r}} \) | Reverse rate constant for Eq. (1) | 5×10³ | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{\mathrm{eq2af}} \) | Forward rate constant for Eq. (2a) | 3.47×10⁷ | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
| \( k_{\mathrm{eq2ar}} \) | Reverse rate constant for Eq. (2a) | 1.471 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{2\mathrm{b}} \) | Rate constant for Eq. (2b) | 4.7×10⁻² | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{\mathrm{eq2cf}} \) | Forward rate constant for Eq. (2c) | 3.47×10⁷ | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
| \( k_{\mathrm{eq2cr}} \) | Reverse rate constant for Eq. (2c) | 331 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{2\mathrm{d}} \) | Rate constant for Eq. (2d) | 4.6×10⁻³ | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{3\mathrm{f}} \) | Forward rate constant for Eq. (3) | 3.47×10⁷ | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
| \( k_{3\mathrm{r}} \) | Reverse rate constant for Eq. (3) | 59.37 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{4\mathrm{a}}^{\mathrm{cat}} \) | Forward rate constant for Eq. (4a) | 4.22 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{4\mathrm{b}}^{\mathrm{cat}} \) | Forward rate constant for Eq. (4b) | 8.32 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{5\mathrm{f}} \) | Forward rate constant for Eq. (5) | 1.2×10⁷ | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
| \( k_{5\mathrm{r}} \) | Reverse rate constant for Eq. (5) | 6.0×10⁻² | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{6\mathrm{f}} \) | Forward rate constant for Eq. (6) | 87 | \( \mathrm{s^{-1}} \) | (1) |
| \( k_{7} \) | Forward rate constant for Eq. (7) | 14.02 | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
| \( k_{8} \) | Forward rate constant for Eq. (8) | 1.13 | \( \mathrm{M^{-1}s^{-1}} \) | (1) |
* The parameters \( n \) and \( m \) represent the number of R-sites and the number of GP32 binding sites per primer, respectively. Since no direct experimental measurements are available, \( n = 8 \) and \( m = 4 \) were chosen as model assumptions to reflect a typical primer architecture and to simplify the kinetic scheme.
These are the variables that were used in our RPA model:
| Variable | Description |
|---|---|
| \( F \) | Forward primer |
| \( G \) | Single stranded DNA binding protein (Gp32) |
| \( R \) | Recombinase complex |
| \( FG \) | Forward primer/Gp32 complex |
| \( FGR_p \) | Unstable Forward primer/Gp32/recombinase complex |
| \( FGR \) | Stable Forward primer/Gp32/recombinase complex |
| \( FGnR_p \) | Forward primer/Gp32 with unstable \( n \) recombinase |
| \( FnR \) | Forward primer complexed with unstable filament of \( n \) recombinase molecules |
| \( FnRD \) | Forward primer complexed with \( n \) recombinase molecules with DNA |
| \( D \) | Template DNA (the species we want to track for amplification) |
| \( FD \) | Forward primer/DNA complex |
| \( P \) | Polymerase |
| \( PFD \) | Polymerase/forward primer/DNA complex |
| \( dNTPs \) | Deoxynucleotides |
| \( PP_i \) | Pyrophosphate (byproduct of polymerization) |
Sensitivity analysis for unknown parameters
To generate data that reflect our experimental laboratory setup, it was important for us to use the same concentrations in the model as in the actual experiments. A key challenge in using commercial diagnostic kits is that component concentrations are often proprietary. To overcome this uncertainty, we strategically used our model to perform a comprehensive sensitivity analysis. By simulating a wide range of concentrations for unknown components like the Single-Strand Binding Protein (SSB), polymerase, and recombinase, we could identify which factors had the most significant impact on performance. This allowed us to de-risk our design by understanding its potential failure points.
The primer concentration was predetermined and set to 10 µM, matching the conditions used in our laboratory experiments. In contrast, the concentrations of the single-stranded DNA-binding protein (SSB; represented by variable G in the model), the polymerase (P), and the recombinase (R) were not specified by the manufacturer.
The following graphs illustrate how variations in these individual concentrations affect both the final DNA concentration produced within 30 minutes and the overall reaction rate.
The numerical impact of these variations can be summarized as follows:
| Species | Min DNA (nM) | Max DNA (nM) |
|---|---|---|
| \( G \) | 44.963 | 2364.463 |
| \( P \) | 436.662 | 449.655 |
| \( R \) | 447.942 | 447.942 |
This shows that the model is most sensitive to changes in the concentration of SSB, while the other two unknown concentrations play a comparatively minor role. A similar effect has also been reported in experimental studies [4].
To gain a deeper understanding of our system, we additionally varied the concentrations of other components such as dNTPs, DNA, and the primers.
The numerical impact of these additional variations can be summarized as follows:
| Species | Min DNA (nM) | Max DNA (nM) |
|---|---|---|
| \( F \) | 447.941 | 447.943 |
| \( D \) | 447.941 | 447.943 |
| \( \mathrm{dNTP} \) | 447.942 | 447.943 |
It can be observed that, within the tested range of DNA, dNTP and primer concentration, no significant change in the final product occurs. This underlines the robustness and efficiency of the RPA process. A more detailed analysis of varying DNA concentrations will be presented in the Limit of Detections (LOD) analysis.
In this part, we show the concentration profiles of various components over the course of the RPA reaction. To avoid presenting an excessive number of graphs, we selected one representative set of initial concentrations for the following simulations. This ensures better comparability between the graphs. The same set of initial concentrations is used throughout the remainder of this section.
The initial concentrations used at the start of the simulation were as follows:
| Species | Initial concentration [M] |
|---|---|
| \( F \) | 1.00 × 10⁻⁵ |
| \( G \) | 4.50 × 10⁻⁷ |
| \( R \) | 5.90 × 10⁻⁶ |
| \( FG \) | 0 |
| \( FGR_p \) | 0 |
| \( FGR \) | 0 |
| \( FGnR_p \) | 0 |
| \( FnR \) | 0 |
| \( FnRD \) | 0 |
| \( D \) | 3.80 × 10⁻¹³ |
| \( FD \) | 0 |
| \( P \) | 1.50 × 10⁻⁸ |
| \( PFD \) | 0 |
| \( \mathrm{dNTPs} \) | 2.40 × 10⁻⁶ |
One of our detection methods is the CRISPR-Cas12 approach using the LbCas12a enzyme. The model realistically simulates both the generation of DNA copies and the resulting signal. The colorimetric signal is produced when a fluorescently labeled reporter is cleaved. This cleavage process is triggered once a critical amount of target DNA is present.
Using the model, we can calculate the temporal dynamics of reporter cleavage and then convert these into fluorescence values using an experimentally determined calibration factor.
Similar to our RPA module, we used a system of ODEs derived from the main chemical reactions that take place during the CRISPR/Cas12a process.
The Cas12a reactions:
The Cas12a ODEs:
$$ [\mathrm{Reporter}]_{\mathrm{free}} = [\mathrm{Reporter}]_{\mathrm{total}} - [\mathrm{clDR}],\quad [\mathrm{Cas}]_{\mathrm{inactive}} = [\mathrm{Cas}]_{\mathrm{total}} - [\mathrm{Cas}]_{\mathrm{active}} $$
An overview of the variables used in the Cas12a model is given below:
| Variable | Description |
|---|---|
| \([\mathrm{Cas}]_{\mathrm{total}}\) | Total concentration of Cas12a protein (active + inactive) |
| \([\mathrm{Cas}]_{\mathrm{inactive}}\) | Inactive Cas12a pool before binding to target DNA |
| \([\mathrm{Cas}]_{\mathrm{active}}\) | Activated Cas12a after cis binding and cleavage |
| \(D\) | Target DNA recognized by Cas12a (activator) |
| \(D^{\ast}\) | Cleaved target DNA after cis activation |
| \(C_{\mathrm{cis}}\) | Intermediate Cas12a-DNA complex in cis activation |
| \(C_{\mathrm{trans}}\) | Cas12a-reporter encounter complex in trans cleavage |
| \([\mathrm{Reporter}]_{\mathrm{total}}\) | Total concentration of reporter molecules |
| \([\mathrm{Reporter}]_{\mathrm{free}}\) | Unbound (uncleaved) reporter concentration |
| \([\mathrm{clDR}]\) | Concentration of cleaved reporter molecules (fluorescent signal) |
Furthermore, the following table helps to identify the Cas12a parameters used in our equations and to clarify their meaning.
| Parameter | Definition | Value | Unit | Reference |
|---|---|---|---|---|
| \(k_{1}^{\mathrm{cis}}\) | Association rate (inactive Cas12a \(+\; D \rightarrow C_{\mathrm{cis}}\)) | -- | \(\mathrm{M^{-1}\,s^{-1}}\) | Model assumption\(^*\) |
| \(k_{-1}^{\mathrm{cis}}\) | Dissociation rate (\(C_{\mathrm{cis}} \rightarrow\) inactive Cas12a \(+\; D\)) | -- | \(\mathrm{s^{-1}}\) | Model assumption\(^*\) |
| \(k_{\mathrm{cat}}^{\mathrm{cis}}\) | Turnover in cis (activation on target DNA) | 0.054 | \(\mathrm{s^{-1}}\) | (5) |
| \(K_{m}^{\mathrm{cis}}\) | Michaelis constant for cis activation (w.r.t. \(D\)) | 0.05 | \(\mathrm{nM}\) | (6) |
| \(k_{1}^{\mathrm{trans}}\) | Association rate (\([\mathrm{Cas}]_{\mathrm{active}} + [\mathrm{Reporter}]_{\mathrm{free}} \rightarrow C_{\mathrm{trans}}\)) | -- | \(\mathrm{M^{-1}\,s^{-1}}\) | (6)\(^*\) |
| \(k_{-1}^{\mathrm{trans}}\) | Dissociation rate (\(C_{\mathrm{trans}} \rightarrow [\mathrm{Cas}]_{\mathrm{active}} + [\mathrm{Reporter}]_{\mathrm{free}}\)) | -- | \(\mathrm{s^{-1}}\) | (6)\(^*\) |
| \(k_{\mathrm{cat}}^{\mathrm{trans}}\) | Turnover in trans (reporter cleavage) | 3.85 | \(\mathrm{s^{-1}}\) | (3) |
| \(K_{m}^{\mathrm{trans}}\) | Michaelis constant for trans cleavage (w.r.t. reporter) | 27.29 | \(\mathrm{nM}\) | (3) |
* Following study (6), Cas-reporter binding occurs so rapidly that it can be considered quasi-instantaneous. We adopt the same simplification for the cis association/dissociation steps; therefore explicit numerical values for \(k_{1}^{\mathrm{cis}}\), \(k_{-1}^{\mathrm{cis}}\), \(k_{1}^{\mathrm{trans}}\), and \(k_{-1}^{\mathrm{trans}}\) are not specified.
And here is an overview of the initial concentrations used in the Cas12a simulation:
| Species | Initial concentration [M] |
|---|---|
| \([\mathrm{Cas}]_{\mathrm{total}}\) | \(1.0 \times 10^{-8}\) |
| \([\mathrm{Cas}]_{\mathrm{inactive}}\) | \(1.0 \times 10^{-8}\) |
| \([\mathrm{Cas}]_{\mathrm{active}}\) | \(0\) |
| \([\mathrm{Reporter}]_{\mathrm{total}}\) | \(1.0 \times 10^{-7}\) |
| \([\mathrm{Reporter}]_{\mathrm{free}}\) | \(1.0 \times 10^{-7}\) |
| \([\mathrm{clDR}]\) | \(0\) |
| \(D\) | \(3.8 \times 10^{-13}\) |
| \(D^{\ast}\) | \(0\) |
| \(C_{\mathrm{cis}}\) | \(0\) |
| \(C_{\mathrm{trans}}\) | \(0\) |
To accurately parameterize our model, we first needed to determine the Michaelis-Menten kinetics of \(k_{\mathrm{cat}}^{\mathrm{trans}}\) and \(K_{m}^{\mathrm{trans}}\) of our specific LbCas12a enzyme.
To do that, our Wet Lab first conducted an experiment to determine the Michaelis-Menten kinetics of our LbCas12a enzyme. Upon critical review of the initial data, we determined it was not suitable for a reliable kinetic fit. The experiment had yielded measurements at only two substrate concentrations, which is insufficient to accurately estimate Michaelis-Menten parameters \(K_{m}^{\mathrm{trans}}\) and \(k_{\mathrm{cat}}^{\mathrm{trans}}\).
Holding our model to a high standard of scientific accuracy, we made the decision not to use this preliminary data.To maintain project velocity and avoid experimental bottlenecks, our team immediately pivoted to a robust literature-based parameterization strategy. We performed a thorough search and sourced kinetic data from a study investigating the same enzyme under comparable buffer conditions [3]. This adaptive approach ensured our model was built on a reliable and scientifically valid foundation, allowing our project to proceed without compromising its accuracy.
We used the dsDNA data from this study, since our target is also double-stranded DNA. Based on this, we parameterized our model using \(k_{\mathrm{cat}}^{\mathrm{trans}} \approx 3.85\ \mathrm{s}^{-1}\) and an appropriate Km value from the literature.
Our simulation calculates the concentration of cleaved reporters over time based on the turnover number (\(k_{\mathrm{cat}}^{\mathrm{trans}}\)) of Cas12a. However, to relate these values to the actual fluorescence signal measured in experiments, a linear calibration factor is required. To determine this factor, the Wet Lab conducted the following experiment:
Experiment:
Subsequently, a linear regression was applied, resulting in a line whose slope describes the relationship between reporter concentration and fluorescence intensity.
This calibration factor enables us to convert simulated reporter concentrations directly into fluorescence values that can be compared with experimental data.
In order to distinguish true signal from background noise, we needed to determine the background fluorescence generated by the complete system in the absence of target DNA, i.e., under conditions where no actual cleavage and signal generation occur. To do this, our Wet Lab performed the following experiment:
Experiment:
Afterwards, we calculated the mean of these measurements and defined it as the background fluorescence.
The effective fluorescence signal was then calculated as: Signal = Total fluorescence − Background fluorescence
When we integrated our parameterized RPA and Cas12a modules, our model made a critical prediction: the combined system would fail to produce a signal.
LbCas12a not only cleaves the reporter in trans, but also cuts the target dsDNA in cis, i.e., at the target site itself. This cis-cleavage reaction occurs before the trans-cleavage is initiated and has two important consequences:
To incorporate this into our model, we parameterized the cis-cleavage kinetics separately. A study [5] reported \(k_{\mathrm{cat}}^{\mathrm{cis}}\) values for LbCas12a ranging from 0.0024 to 0.054 \(\mathrm{s}^{-1}\). To reflect a worst-case scenario (maximum inhibition by cis-cleavage), we used \(k_{\mathrm{cat}}^{\mathrm{cis}} = 0.054\ \mathrm{s}^{-1}\). Another study [6] provided indirect evidence suggesting that the \(K_{m}^{\mathrm{cis}}\) of LbCas12a is below 5 nM. Based on this indication, we chose a \(K_{m}^{\mathrm{cis}}\) value of 0.05 nM, again to model a conservative scenario.
In our ODE model, Cas12a activation and cis-cleavage are described using Michaelis-Menten kinetics:
This formulation reflects the gradual activation of Cas12a as more DNA becomes available, rather than assuming an instantaneous activation. The cis-reaction consumes DNA over time, reducing the pool of RPA amplicons available for trans-cleavage. This effect is particularly strong in the early amplification phase, when RPA product concentrations are still low.
Using these parameters, our model predicted no detectable fluorescence signal when Cas12a is active from the beginning of the reaction. The cis-cleavage removes DNA faster than RPA can produce it at early time points, resulting in a “deadlock” situation where Cas12a never reaches sufficient activation to cleave reporters in trans.
To overcome this, we reasoned that temporally separating amplification and detection could avoid this early inhibition. By allowing the RPA reaction to proceed for several minutes before introducing Cas12a, a sufficient amount of DNA accumulates. Once Cas12a is introduced, cis-cleavage no longer significantly impacts the reaction because the DNA concentration is already high. To test this, we implemented a delay parameter in our model, where Cas12a activation starts only after a defined time (e.g., 300 s). The simulations clearly show that a short delay is sufficient to allow reporter cleavage to proceed efficiently.
This modeling insight was critical. It provided a clear, actionable design principle for our hardware: a two-chamber system that temporally separates amplification and detection. In such a setup, RPA would run first in isolation, allowing sufficient DNA to accumulate. After a defined incubation period, the sample would be transferred into the Cas12a reaction chamber, initiating detection only after amplification is well underway. This temporal separation could be achieved through simple means such as lateral flow or paper-based microfluidic systems, where controlled migration of the reaction mixture triggers Cas12a activity after a predefined delay. Our model predicts that this approach would prevent early cis-cleavage from depleting the target DNA and would enable robust signal generation. By identifying this failure mode in silico, the model saved us time and resources that would have been spent troubleshooting failed wet lab experiments.
Starting from this point, the following sections will address the case in which the reaction is separated, with Cas12a being activated 300 seconds after the start of the RPA.
To assess whether a range of RPA product concentrations is sufficient to make reliable predictions, we calculated the minimum amount of RPA-generated DNA required to activate Cas12a strongly enough to cleave all reporters and generate the maximal signal within 30 minutes. This threshold was found to be 0.042 nM of RPA product. When varying the G concentration, the lowest resulting RPA product concentration was 44.963 nM, which is roughly 1000 times higher than the minimum required amount. This indicates that, under all tested conditions, the amount of amplified DNA is more than sufficient to fully activate Cas12a, meaning that variations in RPA yield within this range are unlikely to affect the final signal intensity.
Threshold Calculation
The detection threshold was determined following the approach described in [3] using the formula:
where \( \mu_{\mathrm{blank}} \) is the mean fluorescence of blank samples, \( \sigma_{\mathrm{blank}} \) is the corresponding standard deviation, and \(K\) is a confidence factor. Following [3], we set \(K = 3\), which corresponds to a 99.9 % confidence level that blank measurements lie below the detection threshold. Higher \(K\) values result in more conservative thresholds but increase statistical confidence.
The blank measurements used for this calculation were identical to those used to determine the background fluorescence. Based on these values, the detection threshold was set to 1473 a.u.
Simulation Constraints:
Simulations were performed across a range of starting DNA concentrations to identify the lowest input at which the signal exceeded the threshold. While the search formally started at \(1 \times 10^{-22}\ \mathrm{M}\), concentrations below the equivalent of a single molecule per reaction are biologically meaningless. The simulation was therefore constrained to exclude values below this physical lower bound.
For the assay volume of 47.5\( \mu\mathrm{L} \), one DNA molecule corresponds to a concentration step of \(3.496 \times 10^{-20}\ \mathrm{M}\). Accordingly, limits are expressed in whole numbers of molecules, not arbitrary decimal concentrations.
LOD Determination
Using Monte Carlo simulations incorporating Poisson-distributed start copies, an initiation gate, measurement noise, and the detection rule “threshold crossed before 30 min,” the continuous search yielded a limit at 7.47 expected copies.
Poisson distribution of input molecules: For a nominal initial concentration \(C_0\) and reaction volume V, the expected number of input DNA molecules is
where \(N_{\mathrm{A}}\) is Avogadro’s number. The actual number of molecules N used in each simulation is drawn from a Poisson distribution
Initiation gate: Only reactions with at least Nmin seed molecules can proceed. Given \(N \ge N_{\min}\), each seed initiates amplification with probability \(p_{\mathrm{seed}}\). The probability that the reaction successfully initiates is
A Bernoulli draw with success probability \(p_{\mathrm{init}}(N)\) determines whether the reaction proceeds.
Conversion between copies and molar concentration uses
Detection rule: A replicate is considered positive if the simulated fluorescence signal \(F(t)\) crosses the detection threshold Fth before \(t_{\max}\) = 1800 s,
Cas12a activation and Michaelis-Menten cleavage: After a fixed delay \( t_{\mathrm{delay}} \)=300 , the concentration of activated Cas12a,\( [\mathrm{Cas}]_{\mathrm{active}}(t) \), evolves according to
and the cleaved reporter concentration \([\mathrm{clDR}](t)\) follows a Michaelis-Menten trans-cleavage equation
Limit of detection (LOD): The LOD is defined as the smallest nominal concentration \( C_{\mathrm{LOD}} \) for which the detection probability reaches the target level ptarget=0.95,
where \(P_{\mathrm{det}}(C)\) is estimated by repeated stochastic simulation (Poisson input, initiation gate, measurement noise, deterministic kinetics).
The continuous search yielded \(\lambda_{\mathrm{LOD}}\) = 7.47 expected molecules, which was quantized up to the next integer number of input molecules
corresponding to a concentration of
in V=47.5 \(\mu\mathrm{L}\).
Cas12a remained inactive for the first 300 s, preventing any signal from crossing the threshold before this time.
Time-to-Threshold Distribution and Validation
At the LOD of 8 copies, the median time-to-threshold was 548.7 s (9.15 min), i.e., about 4 min after Cas activation. The distribution tail was modest, with the 90th and 95th percentiles at 663.2 s (11.05 min) and 744.0 s (12.40 min), respectively.
A validation experiment with 40 replicates at 8 copies yielded 39/40 detections (97.5 %) within the 30 min assay window. This slightly exceeds the 95 % detection target because the rounded input (8 copies) is marginally higher than the theoretical continuous boundary (7.47 copies).
Summary:
An overview of the core system is shown in the following figures:
A successful diagnostic must be reliable not only on the day it is made but also after months of storage and transport. To address this critical real-world requirement, we developed an Enzyme Stability Model. This model is designed to predict the loss of enzyme activity over time under different temperature conditions, allowing us to forecast the shelf-life of our test and to quantitatively validate our choice of freeze- drying approach. Connect Entrepreneurship
The framework is grounded in established thermodynamic and kinetic principles. The temperature dependence of the enzyme denaturation rate, \(k_{d}(T)\), is described by the Arrhenius equation. The subsequent loss of enzyme activity over time, \(A(t)\), is then modeled as an irreversible first-order decay process. This formulation allows us to simulate the impact of arbitrary temperature histories—such as a cool storage phase followed by a room-temperature usage phase—by calculating the cumulative loss of activity in discrete time steps.
This is an overview of the key variables, parameters, and equations used in the model:
| Symbol | Description | Value | Unit |
|---|---|---|---|
| \(k_{d,\mathrm{ref}}\) | Denaturation rate at \(T_{\mathrm{ref}}\) | \(2.60 \times 10^{-3}\) | \(\mathrm{min^{-1}}\) |
| \(E_a\) | Activation energy of denaturation | \(1.22 \times 10^{5}\) | \(\mathrm{J\,mol^{-1}}\) |
| \(R\) | Gas constant | \(8.314\) | \(\mathrm{J\,mol^{-1}\,K^{-1}}\) |
| \(T_{\mathrm{ref}}\) | Reference temperature | \(323.15\) \((50^\circ\mathrm{C})\) | \(\mathrm{K}\) |
| \(T_{\mathrm{home}}\) | Home/room temperature | \(295.15\) \((22^\circ\mathrm{C})\) | \(\mathrm{K}\) |
| \(T_{\mathrm{storage}}\) | Storage temperature | \(278.15\) \((5^\circ\mathrm{C})\) | \(\mathrm{K}\) |
| \(\Delta t\) | Timestep | \(1\) | \(\mathrm{min}\) |
| \(A_0\) | Initial activity | \(100\) | \(\%\) |
| \(A(t)\) | Remaining enzyme activity (over time) | — | \(\%\) |
Temperature Dependence of the Denaturation Rate
The temperature-dependent denaturation rate \(k_{d}(T)\) follows the Arrhenius equation:
Activity Decay Over Time
Enzyme inactivation is modeled as a first-order irreversible process:
with \(A(t)\) the remaining activity (in %) at time \(t\). For discrete time steps \(\Delta t\):
This update allows arbitrary temperature profiles \(T(t)\): at each step, read \(T(t)\), compute \(k_{d}(T)\), and advance A.
Temperature Profile
A simple two-phase storage/usage profile:
where \(T_{\mathrm{storage}}\) is the storage temperature, the storage duration, and \(T_{\mathrm{home}}\) the usage temperature (e.g., room temperature).
A key modeling choice was to identify which of the many proteins in our system would be the primary limiting factor for shelf-life. We adopted a simplified single-bottleneck assumption, treating one protein's decay as the pace-setter for the entire assay. Protein size often correlates with thermal robustness [7] . Although UvsY is the smallest component, it typically operates in a stabilizing complex with UvsX. We therefore select Gp32 (SSB), the smallest component not obligatorily stabilized by a partner, as the practical pace-setter for shelf-life prediction: if Gp32 remains functional over a given temperature history, the assay is expected to remain functional as well.
This choice is strongly supported by our earlier sensitivity analysis, which showed that the concentration of G (our Gp32/SSB component) has the most significant influence on the model's output, implying that a decline in G activity will dominate assay performance. This is an explicit modeling assumption and simplification.
| Protein | Size (Number of amino acids) | Source |
|---|---|---|
| LbCas12a | 1228 | [10] |
| UsvY | 137 | [11] |
| UsvX | 391 | [12] |
| Gp32 | 301 | [13] |
| Bsu Pol I | 880 | [14] |
Direct measurement of \(E_a\) and \(k_{d,\mathrm{ref}}\) for isolated Gp32 was not feasible because the available kits bundle multiple proteins, and our literature search did not yield Gp32-specific stability constants.
To overcome this challenge, we adopted a size-matched proxy parameters from plant peroxidases of comparable length. In particular, we used the values reported in a scientific paper [8] for Eichhornia crassipes leaf peroxidase - \(E_a \approx 122\ \mathrm{kJ\,mol^{-1}}\) and \(k_{d,\mathrm{ref}} \approx 2.6\times10^{-3}\ \mathrm{min^{-1}}\ \text{(at }50^\circ\mathrm{C)}\).
Because we could not locate the exact sequence length of that enzyme, we surveyed lengths of related plant peroxidases, which cluster around ∼300 amino acids; based on this, we assume the Eichhornia peroxidase is of similar size (~300 amino acids) and use it as a pragmatic proxy for Gp32.
With this framework, we can chart the time course of Gp32 activity. By modeling a temperature history that separates a cool-storage phase from a post-purchase, room-temperature phase, we obtain shelf-life predictions that closely mirror real-world handling.
The results reveal how storage conditions dramatically affect shelf life. At refrigeration temperatures (5 °C), enzyme activity declines slowly over many months. Once brought to room temperature (22°C), degradation accelerates substantially, but the assay remains viable for weeks of typical use.
To connect molecular stability to assay performance, we couple the computed Gp32 activity \(A(t)\) to our core Reaction Model (RPA→CRISPR) by scaling the Gp32-sensitive kinetic steps. Specifically, reaction rates that depend on SSB/Gp32 concentration are multiplied by the activity factor \(A(t)\). This yields time-resolved detection behavior under realistic temperature histories: how quickly fluorescence crosses threshold after activation as a function of prior storage conditions.
This integrated simulation yields a direct, quantitative prediction of how shelf-life affects the user experience. The results, shown in the fluorescence curves and the time-to-threshold heatmap, demonstrate how a longer time at room temperature leads to reduced enzyme activity. This, in turn, slows down the diagnostic reaction, increasing the time a user must wait for a positive signal. For example, the heatmap shows that a test stored for 100 days and then kept at room temperature for another 25 days (total 125 days) would take 19 minutes to produce a detectable signal, whereas one used immediately after storage would take only 5 minutes.
Limitations and outlook
Using a plant peroxidase as a size-matched proxy for Gp32 is a pragmatic simplification: sizes are comparable, but the peroxidase’s fold and microenvironment differ from Gp32, so absolute rates may shift. The framework itself is mechanistic and modular. Once Gp32-specific stability constants (\(E_a\) and \(k_{d,\mathrm{ref}}\)) are measured in dedicated assays, we can replace the proxy values in the Arrhenius model and obtain sharper shelf-life predictions without altering the rest of the pipeline.
Modeling for our Entrepreneurial Plan
From the very beginning, our project was designed with a clear entrepreneurial goal: to create a diagnostic that is globally accessible and independent of the cold chain. We identified lyophilization (freeze-drying) as a core component of our manufacturing strategy to achieve this. However, this presented a key business question: is the added manufacturing cost of freeze-drying justified by the extended shelf-life? Our integrated reaction model was developed, among other things, to answer this question quantitatively. By coupling our kinetic model with the Enzyme Stability Framework, we didn't just find a problem; we validated our initial business decision. The results, particularly the time-to-threshold heatmap, provide powerful evidence for our strategy. The model clearly demonstrates that without lyophilization, the test's performance degrades rapidly, rendering it unreliable for the very low-resource settings we aim to serve. It quantitatively proves that the immense value gained by eliminating the cold chain, ensuring product reliability and market accessibility, far outweighs the manufacturing cost, forming a cornerstone of our business case.
The ODE system is solved using the LSODA solver, which proved to be more robust against integration failures than other solvers, such as CVODES [2]. LSODA combines two numerical methods: a backward differentiation formula (BDF) algorithm for stiff systems and an Adams-Moulton algorithm for non-stiff systems.
Since our model involves multiple timescales and reaction rates that differ by several orders of magnitude, it is classified as a stiff system. Although both LSODA and CVODES use BDF methods for stiff problems, results reported in [2] suggest that the LSODA implementation outperforms that of CVODES.
Our Reaction Model was a powerful and essential part of our project's journey. We built it to be more than just a simulation; it was a tool for navigating the real-world uncertainties of biochemical engineering. By embracing challenges transparently, we were able to produce insights that guided our entire project.
Our work began by tackling the "black box" problem of proprietary kits. Despite not knowing all the component concentrations, we successfully validated our model's core RPA amplification module against our wet lab's experimental data. This was a critical achievement, as it provided the anchor of confidence we needed to trust our model's predictions. That trust was essential when our model made its most important finding: the prediction of a "cis-cleavage deadlock," a fundamental system failure that we would have been blind to in the lab.
This insight allowed us to propose a tangible engineering solution - the "two-chamber" hardware design - demonstrating the model's direct value. Furthermore, our model proved its precision by rigorously determining a limit of detection of 8 copies, a prediction that was validated based on scientific research literature.
By linking performance and stability of our diagnostic we gained a powerful, quantitative validation for our initial entrepreneurial strategy of incorporating freeze-drying. The model clearly shows that without a method to preserve enzyme stability, the test's performance would be unreliable for real-world use where a cold chain is not guaranteed. This result is a cornerstone of our business case, proving that lyophilization is not just a desirable feature but an essential requirement for our diagnostic to be effective and accessible globally.
Ultimately, our Journey tells the story of the real scientific process. It is a story of facing challenges, adapting to unexpected results, and using a combination of computation and experimentation to design something that works.
We aimed to go beyond a purely mechanistic model of our test. In collaboration with the Human Practices team, we developed an agent-based model to assess how deploying our test could impact STI incidence. The model’s goal is to simulate how varying testing rates influence infection dynamics over time — specifically, to evaluate how increasing testing frequency reduces transmission and to identify thresholds where interventions become most effective.
We developed an agent-based model (ABM) to simulate the transmission dynamics of sexually transmitted infections (STIs), specifically focusing on Chlamydia trachomatis in the Swiss population. The model captures individual-level heterogeneity in sexual behavior, partnership dynamics, and healthcare-seeking patterns to evaluate the impact of various intervention strategies on disease prevalence.
What is an ABM?
An agent-based model (ABM) is a type of computer simulation that looks at how individual people - or “agents” -behave and interact with each other. Each agent can have different characteristics, such as how often they meet partners, get tested, or change behavior over time. By simulating many individuals and their interactions, ABMs help us understand how diseases like Chlamydia can spread in a population and how different strategies, such as more frequent testing, can help reduce infections.
Each agent in the simulation represents an individual aged 15-100 years with the following attributes:
The model initializes with 7,663 agents, representing the Swiss population aged 15+ (in thousands). [1] The population composition reflects Swiss demographic data:
Initial infection prevalence is age-stratified according to Swiss surveillance data [3] from the Federal Office of Public Health, with highest rates among younger age groups (520.7 per 100,000 for ages 15-24) and declining with age. These values were multiplied by a scaling factor to account for underreporting of cases and to approximate the true incidence. This factor was estimated to be around 10 for Switzerland by dividing the estimated total number of cases by the number of reported cases [4].
Risk groups influenced partner-seeking behavior, with high-risk individuals being twice as active as average. To keep the population average consistent with literature values, low-risk individuals (90% of the population) were assigned a relative activity of 0.8. Risk group also affected sexual acts per week, maximum concurrent partners, and testing frequency.
Sexual transmission occurs through partnerships with probability dependent on:
The infection progression follows established epidemiological parameters:
Partnership formation rates are age- and sex-specific, derived from empirical data:
Dissolution probabilities follow empirical survival curves differentiated by sex and relationship duration:
Testing propensity varies by symptom status and risk group:
Agents modify their partner acquisition behavior following positive test results:
The model incorporates Swiss demographic rates to maintain population realism:
The model operates on a weekly time step, with each iteration executing the following sequence for each agent:
All probabilistic events use a dedicated random number generator with configurable seed for reproducibility. Key stochastic processes include:
The model tracks multiple epidemiological metrics at each time step:
Model parameters were derived from:
Parameters lacking values in the literature were estimated based on expert consultation with healthcare providers and/or insights from survey data.
The model is implemented in Python 3.12 using object-oriented programming principles. Key dependencies include:
As only positive test results are mandatorily reported, precise national testing rates are uncertain. We therefore calibrated a current scenario using our survey-based annual testing propensities (0.08 on average, scaled to 0.07 for the low-risk group and 0.15 for the high-risk group) and explored plausible bounds (see below). Under this baseline, the system settles into a quasi-steady state with:
These baseline dynamics imply that, without changes in testing behavior or partner notification, marginal year-to-year variation arises mainly from stochasticity in partnerships, transmission, and symptoms, rather than from structural drift.
To test robustness against uncertainty in routine testing, we varied annual testing rates within a plausible range:
Across this range of plausible testing propensities, the model’s epidemic trajectory remained broadly stable. Lower testing rates resulted in a modest upward shift in prevalence, while higher testing frequencies led to a slight overall reduction due to faster detection and treatment of cases. Importantly, none of the scenarios produced qualitative changes in epidemic behavior - the system consistently exhibited a broadly stable trajectory with a slight upward trend over time. This demonstrates that the model’s predictions are robust to some uncertainty in current testing behavior.
Interpretation: With reduced testing, infected counts drift slightly upward but remain bounded; no qualitative shift in transmission pathways is observed.
Interpretation: The baseline reproduces expected patterns: higher counts among heterosexual agents (demography-driven), low sustained levels among lesbian women, and intermittent clusters among gay/bisexual men, superimposed on steady network turnover.
Interpretation: Even with higher routine testing, the model settles again into a slight upward trend after a temporary fluctuation.
Survey results indicated that the vast majority of respondents would probably or definitely test more often. Specifically, 81.6% said they would use TRACE if aware of it, and 79% reported likely increasing their testing frequency. This trend was consistent across groups, with 72.8% of frequent testers and 82.6% of non-testers agreeing.
While these survey responses provide valuable insight into potential behavior changes, it is important to note that self-reported intentions do not always translate directly into real-world action. To account for this uncertainty, our epidemiological model explores a range of possible increases in testing rates rather than assuming full adherence to reported intentions. This approach allows us to quantify the potential population-level impact of TRACE under both conservative and optimistic adoption scenarios.
Scenario: annual testing rates increased to 0.28 for low-risk and 0.6 for high-risk individuals.
Interpretation: This moderate scenario represents a realistic short-term effect based on partial adoption of TRACE. The model predicts a sustained reduction in prevalence, with the largest decline observed among heterosexual agents, who make up the majority of the population. These findings suggest that even without full adoption, TRACE could meaningfully lower transmission rates through earlier detection and treatment, breaking infection chains before they propagate.
Scenario: annual testing rates increased to 0.7 for low-risk and 1.5 for high-risk individuals.
Interpretation: Under optimistic conditions - where TRACE achieves widespread awareness and regular usage - the model predicts a near-elimination of infection within the simulated period of 10 years. This highlights the nonlinear relationship between testing frequency and disease prevalence: once a critical proportion of the population tests frequently, transmission becomes unsustainable, and prevalence rapidly declines.
The result illustrates the transformative potential of accessible at-home testing to shift population-level disease dynamics.
The epidemiological findings presented above provide quantitative evidence that TRACE is not only a diagnostic innovation but also a public health enabler with tangible societal benefits. By substantially increasing testing accessibility and frequency, TRACE can reshape STI epidemiology on a population scale. Our agent-based model demonstrated that increases in testing rates lead to a marked and sustained reduction in infection prevalence, while widespread adoption could push the system toward near-elimination. These results confirm that TRACE has the potential to become a transformative public health intervention, capable of reducing transmission, improving early detection, and alleviating long-term healthcare burdens.
From an entrepreneurial perspective, this modeling validation provides strong support for our business case. It shows that TRACE can create dual value – both societal impact and economic sustainability. The reduction in undiagnosed infections translates into measurable public health savings and productivity gains, while the convenience and privacy of TRACE address key barriers that currently limit regular STI screening.
The model therefore strengthens the rationale for investing in and scaling TRACE as a viable startup opportunity. It quantifies the positive externalities of widespread testing and underlines the market need for accessible, at-home diagnostic tools. This alignment between scientific validation, social impact, and commercial feasibility underscores TRACE’s potential as a high-impact venture at the intersection of biotechnology and preventive healthcare.
For a detailed description of our business model, value proposition, and market strategy, see our entrepreneurship page.
Taken together, these results demonstrate that the behavioral change enabled by TRACE – more frequent and accessible STI testing – can lead to substantial reductions in prevalence, even under more conservative assumptions. The model highlights how improvements in testing behavior can shift the system from a stable endemic equilibrium toward effective control or elimination.
The ABM was validated by comparing its emergent epidemiological patterns with known characteristics of Chlamydia trachomatis infection in the Swiss population. The goal of validation was not to match exact case numbers from surveillance data – which are influenced by underreporting – but to ensure that the simulated trends and magnitudes are epidemiologically realistic and consistent with literature-based expectations.
The model predicts that infections are strongly concentrated among younger agents, particularly those aged 20–30 years, with prevalence decreasing steadily in older age groups. This distribution reflects real-world surveillance data, where Chlamydia incidence peaks among young adults and declines with age due to reduced partner turnover and increased testing awareness. A small artificial peak appears at around 90 years, which is attributed to the limited number of agents in that age bracket and does not reflect a genuine epidemiological trend. Overall, the age-specific pattern demonstrates that the model correctly captures the demographic concentration of infection in sexually active young populations.
Partnership dynamics were evaluated by tracking the mean number of concurrent partners over time. The simulation stabilized at an average of approximately 0.64 partners per agent, exhibiting only minor stochastic fluctuations over the 10-year period. This suggests that, at any given time, roughly two-thirds of agents are engaged in a sexual partnership – closely aligning with reported data from Switzerland, which indicates an average of 0.73 partners [21]. The absence of long-term drift or oscillatory behavior confirms that the model’s partnership formation and dissolution parameters are balanced and produce a stable social network structure over time.
When stratified by behavioral risk, the model predicts that infections are disproportionately concentrated among high-risk individuals, who maintain more frequent partnerships and higher sexual activity rates.However, low-risk individuals still sustain a smaller but persistent burden of infection, reflecting ongoing spillover through network mixing between groups.
This outcome is supported by current public health strategies that emphasize frequent testing among high-risk populations while acknowledging that occasional testing in the general population remains necessary to prevent reinfection cycles [22].
At the population level, the model maintains a steady-state prevalence of around 1.7%, corresponding to approximately 129.1 infected individuals out of 7,663 simulated agents. When scaled to the Swiss population aged 15+ (factor of 1000), this equates to an estimated ~129,100 concurrent infections, which aligns well with literature estimates exceeding 100,000 cases after accounting for underreporting [4]. The overall number of susceptible and treated agents remains stable throughout the simulation period, confirming that the model has reached a quasi-endemic equilibrium rather than an unstable epidemic expansion or collapse.
Taken together, these validation results indicate that the model successfully reproduces key epidemiological and behavioral properties of Chlamydia transmission:
While some noise appears in small demographic subgroups (e.g., elderly agents), these deviations are expected given the limited number of agents in those cohorts and do not affect overall model reliability. Therefore, the model can be considered valid for simulating population-level and behavioral intervention scenarios relevant to the Swiss context.
Our agent-based model necessarily simplifies the complex reality of sexual network dynamics and STI transmission. The assumption of random mixing within risk groups, even with assortative mixing probabilities, cannot fully capture the highly clustered and community-structured nature of real sexual networks. Real-world sexual networks often exhibit small-world properties and contain dense subgroups that can sustain transmission even when overall prevalence is low [23]. Our limitation of maximum two concurrent partners may underestimate transmission potential in highly connected network cores where individuals maintain multiple concurrent partnerships.
The weekly temporal resolution, chosen for computational efficiency and better implementation of literature parameters, means the model cannot capture short-term casual encounters or same-day partner turnover that may contribute significantly to transmission dynamics, particularly in high-risk populations. Furthermore, the absence of geographic structure assumes that all individuals in Switzerland have equal probability of partnership formation regardless of location, ignoring the reality that sexual networks are strongly influenced by proximity, urban-rural divides, and local social venues [24]. This simplification may affect our estimates of outbreak potential and doesn't allow for the assessment of geographically targeted interventions.
Behavioral aspects in the model are represented in a simplified way. Testing behavior is modeled as a probability that depends on symptoms and risk group, without explicitly incorporating factors such as healthcare accessibility, insurance coverage, stigma, or prior testing experiences. Similarly, condom use and other barrier methods are not modeled individually; instead, their effects are reflected in the overall transmission probability parameters. This design choice keeps the model focused on capturing key testing-driven dynamics while maintaining conceptual clarity and computational efficiency.
The binary classification into high- and low-risk groups does not fully capture the continuous spectrum of sexual behaviors or the dynamic nature of risk throughout an individual’s life. In reality, people may shift between risk levels depending on life circumstances, relationship status, or age, whereas our model treats risk group membership as fixed. Likewise, the model assumes that Swiss residents and immigrants share similar behavioral patterns, which may overlook differences arising from diverse cultural backgrounds and sexual health practices within immigrant populations.
Many model parameters remain uncertain due to limited data availability. Transmission probabilities, particularly for female-to-female transmission, are poorly characterized, as comprehensive transmission studies are both ethically and practically difficult to conduct. While several studies report a transmission probability of approximately 0.05 per sexual act [5, 6, 25], data on specific transmission routes, such as female–female transmission, remain particularly limited.
The model assumes that the true number of infections is roughly ten times higher than the number of reported cases. This assumption primarily influences the absolute number of predicted cases, while having a lesser impact on the relative effects of different testing rates. Since the model’s predictions align with existing epidemiological estimates, this factor appears reasonable. However, it may still vary across age groups, risk populations, and geographic regions.
Model validation faces fundamental challenges in the context of STI transmission. We cannot conduct controlled transmission experiments for ethical reasons, and detailed sexual network data is difficult to collect due to privacy concerns and recall limitations. The model can be validated against aggregate surveillance data (such as comparing the model’s case number prediction to literature estimates), but this provides limited information about whether the underlying transmission processes are correctly represented. Without access to genetic sequence data that could reveal transmission chains or detailed partner notification data that could validate network structure, we must rely on the model's ability to reproduce population-level patterns as evidence of validity.
The immediate priority for model development involves incorporating partner notification mechanisms that reflect real-world contact tracing efforts. Partner notification represents an important component of STI control programs, yet its effectiveness depends on notification rates, partner reachability, and the delay between index case diagnosis and partner testing [26]. Implementing realistic notification cascades would allow evaluation of how improvements in contact tracing effectiveness might amplify the benefits of increased testing. This enhancement would require modeling the probability that infected individuals notify their partners, the delay in notification, and the subsequent testing and treatment behaviors of notified partners.
Incorporating geographic structure and local social contexts represents another important extension, as sexual networks are strongly shaped by these factors. Including geographic mixing would allow the evaluation of targeted interventions and provide insights into how infections spread between communities. However, greater population fragmentation makes parameter estimation increasingly difficult, and even at the current level of model detail, parameter availability was limited.
The extension to multi-pathogen dynamics would significantly enhance the model's public health relevance. Many individuals with one STI have increased risk for others due to shared transmission routes and biological interactions [27, 28]. Modeling co-infections with gonorrhea, syphilis, or HIV would allow for more accurate predictions of epidemic trajectories, identification of high-risk subpopulations, and evaluation of interventions that target multiple infections simultaneously. It could also capture synergistic effects, such as increased susceptibility or transmissibility resulting from co-infections, providing a more realistic representation of STI spread and informing integrated public health strategies. In fact, it is known that HIV increases the risk of acquiring and transmitting other sexually transmitted infections, such as syphilis, gonorrhea, and herpes simplex virus, due to both immunosuppression and higher viral shedding at mucosal sites.