Model

1. Inspiration

One of our main motivations for developing the following modeling strategy was the challenges we encountered in constructing the plasmid pKs-HSP70A-CPN60C-RBCS2t (the “target plasmid”). This plasmid incorporates a highly efficient promoter, terminator, selectable marker, and backbone, all of which have been extensively validated in Chlamydomonas reinhardtii as the model organism. It was designed to be inserted into the endogenous CPN60C gene of C. reinhardtii, with the goal of overexpressing the gene and enhancing the organism's capacity of surviving under heat pressure, so that improving it's capacity of carbon fixation in hot environment. However, the required plasmid backbone was not available nor accessible for us at the time, which forced us to consider synthesizing the entire plasmid, including the backbone, from scratch. This long synthetic sequence not only greatly increased both the financial and time costs, but also introduced a considerable risk of failure.

In light of this, our team decided to use modeling to predict whether transforming wild-type C. reinhardtii CC-124 with the target plasmid would produce a sufficient number of colonies exhibiting detectable CPN60C expression, which we defined as “experimental success.”

At first, we considered molecular dynamics modeling, the approach most commonly adopted by iGEM teams that dedicate to the modeling part. However, with limited knowledge, resources, and time, building a molecular dynamic model to accurately simulate the biological effects of the target plasmid in algal cells for us would become a huge challenge. Therefore, we ultimately decided to adopt a statistical/computational modeling approach. To strengthen our model, we worked hard to review existing studies and literature, drawing on reliable reference data wherever possible.

2. Goal

To develop a computational modeling to predict whether transformation of the Chlamydomonas reinhardtii CC-124 strain with the designed plasmid, pKs-HSP70A-CPN60C-RBCS2t, would yield a sufficient number of colonies with detectable introduced CPN60C expression by PCR.

3. Methodology

3.1 Probability Setup

We adopted an intuitive statistical framework for the modeling: the probability of a transformed algal cell successfully expressing the target gene was defined as P(expression). The success probabilities of the critical steps and factors contributing to expression were defined as P(transformation), P(integration), P(promoter-active), P(transcription), P(translation), and P(survival). Specifically:

StepDefinition
1. Transformation efficiency - P(transformation)The probability that DNA successfully enters the algal cell.
2. DNA integration rate - P(integration)The probability that the target DNA is successfully integrated into the algal genome, or stably expressed in plasmid form.
3. Promoter activity - P(promoter active)The probability that the HSP70A-RBCS2 promoter is successfully activated in CC-124.
4. Transcription efficiency - P(transcription)The probability that mRNA is successfully transcribed.
5. Translation efficiency - P(translation)The probability that the translation mechanism initiates successfully and synthesizes a functional protein.
6. Antibiotic selection survival rate - P(survival)The probability that the transformant survives under the corresponding antibiotic selection.

For simplicity, we refer to the probability values at each step as P1–P6.Therefore, our ultimate goal is to estimate:

P-expression = P1×P2×P3×P4×P5×P6

3.2 Algal Cell Number (N) and Definition of “Experimental Success”

We denote the parameter N as the number of algal cells. Specifically, N-start is defined as the initial number of algal cells used in each experiment, and N-end is defined as the number of transformants successfully expressing the CPN60C gene after the experiment. Thus:

N-end = N-start×P-expression

To align with the experimental conditions reported in an important data source, we set N-start to 4 × 10^6 (Yamano et al., 2013), with details explained in the next part.

Previous studies have shown that a single Chlamydomonas reinhardtii cell is capable of stable amplification (Cao et al., 2009). In other words, under ideal incubation conditions, even a single positive transformant in a sample can produce sufficient colonies through subsequent amplification. In our laboratory practice, the number of colonies confirmed as positive transformants with CPN60C overexpression by PCR is typically ≥100. While this count may include positive transformants after amplification rather than immediately after antibiotic selection, our modeling inevitably simplifies and idealizes real-world experimental conditions, which may inflate the estimated probabilities. Therefore, we set 100 as the N-end threshold to balance potential modeling bias. Therefore:

When N-end ≥ 100, the experiment is considered a success, as a sufficient number of colonies with detectable CPN60C expression have been obtained.

When N-end < 100, the experiment is considered a failure, as the number of colonies with detectable CPN60C expression is insufficient.

3.3 Monte Carlo Simulation

We used Python coding to perform Monte Carlo simulations, running 10,000 iterations. For each simulation, we calculated the estimated N-end and determined whether the outcome represented “success” or “failure”.

Since this represents a routine experimental simulation with moderate risk, we adopted a 95% confidence level and set the success rate threshold at 80%. Accordingly, if the lower bound of the 95% confidence interval for the success rate is at least 80%, we conclude that transforming Chlamydomonas reinhardtii CC-124 with the designed plasmid pKs-HSP70A-CPN60C-RBCS2t is sufficiently likely to generate enough colonies for the introduced CPN60C expression to be detected.

3.4 Sensitivity Analysis

We also planned to conduct a preliminary sensitivity analysis through Python coding to identify the steps or factors with the greatest influence on the overall success rate. This analysis may provide insights into which stages of the process may offer the greatest potential for improvement.

4. Probability Estimation and Definition

P1. CC-124 Transformation Success Rate (P(transformation))

The transformation success rate is influenced by multiple factors, including but not limited to the transformation method, experimental conditions, plasmid concentration, and the state of the cell wall. To simplify the model, we assumed that:

  • The physiological state of the cells is ideal.
  • The DNA used for transformation is double-stranded.
  • Electroporation is employed as the transformation method, as it is widely regarded as highly efficient (Shimogawara et al., 1998, as cited in Schroda & Remacle, 2022). It is also the realistic transformation method we used in our experiments.
  • Electroporation parameters (voltage, pulse duration, number of pulses, cell density, buffer conductivity), plasmid concentration, temperature, humidity, and light intensity are well aligned with the experimental conditions reported by Yamano et al. (2013).

Based on Yamano et al. (2013), the following transformation data were obtained for CC-124:

  • DNA input per transformation experiment: 0.4 μg DNA, 4 × 10^6 cells
  • Cell suspension concentration: 1.0 × 10^8 cells/mL
  • Under specific electroporation conditions, the transformation yield (CFU) was approximately 2930 ± 471 transformants/μg DNA

One thing to note here is that Wang et al. (2019) improved upon Yamano’s protocol, reporting a transformation yield of approximately 6240 ± 2510 transformants/μg DNA for CC-124. Although this method demonstrated higher efficiency with the same DNA dosage, our objective was not to conserve DNA but rather to maximize the transformation efficiency of algal cells. Based on our own experimental observations, as long as cell conditions are suitable and plasmid concentration is sufficient, transformation tends toward saturation. Therefore, we consider the data of Yamano et al. (2013) to be more reliable for our modeling.

From Yamano et al. (2013), the per-cell transformation rate can be estimated as:

Per-cell transformation rate formula

With the reported uncertainty:

Per-cell transformation rate formula

Accordingly, we set:

P1×P2∈[2.5×10^(-4),3.4×10^(-4)]

Reference for Parameter Adjustment

The condition of the cell wall is an important factor to consider. Within the Chlamydomonas research community, it is generally accepted that the presence of a cell wall poses a significant barrier to DNA uptake (Yamano et al., 2013; Wang et al., 2019).

Although cell wall-deficient strains are theoretically easier to transform, some studies (Wan et al., 2025) suggest that they do not always achieve higher transformation efficiencies under electroporation, possibly due to increased susceptibility to cell death.

Conclusion

P1×P2∈[2.5×10^(-4),3.4×10^(-4)]

For simulations, P-transformation is assigned random values within this range that satisfy the given conditions.

P2. DNA Integration or Maintenance — P(integration)

Nuclear transformation in Chlamydomonas reinhardtii primarily occurs through stochastic non-homologous end joining (NHEJ), which results in positional effects and variable integration efficiency (Shahar et al., 2020). Since our study focuses on overexpressing a native gene, the process does not rely specifically on homologous recombination (HR). Consequently, it is difficult to assign a precise probability to the inherently stochastic DNA integration process.

Given the limited time and resources available, we were hardly to identify quantitative data on C. reinhardtii transgenesis that specifically measures integration rates independent of transformation and subsequent processes. Moreover, in our current study, there are few experimental variables that we can directly manipulate to influence DNA integration.

For this reason, and in light of the approximate range we have already established for P-transformation × P-integration, we have chosen not to assign a separate value for P-integration at this stage. Instead, like P-transformation, P-integration will be simulated using random values that satisfy the given conditions.

Conclusion:

P1×P2∈[2.5×10^(-4),3.4×10^(-4)]

P3. Probability of Promoter Activity — P(promoter-active)

The promoter selected for our target plasmid is HSP70A-RBCS2, a strong and widely validated promoter in both previous iGEM projects and academic research. The iGEM Team TU Kaiserslautern, which optimized this promoter and recorded the modification in the iGEM Registry in 2019 and 2020, reported in their Team Wiki Contributions section that the promoter consistently achieves very high expression levels. They described it as a verifiable “gold standard,” with successful translational expression detected in multiple transformants.

More direct and quantitative evidence is provided by a series of studies from Schroda et al. on the HSP70A-RBCS2 fusion promoter (Schroda et al., 2000; Schroda et al., 2002). These studies demonstrated that fusion with HSP70A reduces the transcriptional silencing rate of RBCS2 from 80% to 36%. In other words, the HSP70A-RBCS2 promoter increases the transcriptional success rate to approximately 64%.

Here, we define P-promoter-active as the probability that a promoter remains active in transformants. Given the robust performance of HSP70A-RBCS2 and its demonstrated ability to markedly enhance transcription, we consider it unlikely that values below 0.8 would be consistent with the observed data. Therefore, we set the range for P-promoter-active as:

P3∈[0.8,0.9]

Conclusion: HSP70A-RBCS2 is a highly potent promoter, and for modeling purposes, P3 is defined in the range [0.8, 0.9].

P4. Probability of Normal Transcription — P(transcription)

Transcription is a critical step in transgene expression in Chlamydomonas reinhardtii. Previous studies have shown that this organism possesses a transcriptional silencing mechanism (Schroda et al., 2002), which is particularly pronounced for foreign gene insertions (Baier et al., 2018). Fusion of the HSP70A promoter with a downstream promoter such as RBCS2 markedly reduces the proportion of transformants subject to transcriptional silencing (Schroda et al., 2002).

According to Mackinder (2018), CO₂ concentration, light intensity, and circadian rhythms significantly affect transcription in C. reinhardtii. For the purposes of this study, we assumed near-ideal conditions for these factors and therefore did not treat them as variables. With respect to promoter strength and compatibility, the high activity of HSP70A-RBCS2 has already been discussed and was not included again here to avoid overweighting. Since CPN60C is an endogenous gene, the influence of introns was also excluded from the model.

Our main basis for quantifying P-transcription comes from the work of Schroda et al. (2002). Their results showed that approximately 80% of transformants were transcriptionally silenced in the R-ble construct, compared to only 36% in the HSP70A-RBCS2 fusion construct (AR-ble). In other words, the HSP70A fusion increased the proportion of transformants with successful transcription from 20% to 64%.

Two important considerations should be noted:

  • Measurement method: Schroda et al. used Northern blotting (mRNA detection) and resistance screening. Thus, the reported 64% reflects the transcriptional activation post-selection.However, in our model where survival probability (P6) is independently calculated, we should avoid double-counting the selection step and fits our chain model.
  • Endogenous vs. exogenous genes: Schroda’s work focused on exogenous constructs, whereas our study aims to overexpress an endogenous gene. Baier et al. (2018) noted that while exogenous promoters for RuBisCO or photosystem subunits can drive expression, their rates are generally lower than those of endogenous genes. This suggests that transcription efficiency for endogenous genes should be at least comparable to, if not higher than, that of exogenous constructs.

Therefore, the 64% threshold can reasonably serve as a reference of P4 for our model. To account for variability between our system and that of Schroda, we introduced a ±10% margin, yielding:

P4∈[0.54,0.74]

Conclusion: For modeling purposes, P4 is defined within the range [0.54, 0.74].

P5. Translation Success Rate — P-translation

The factors influencing the translational stage of gene expression are complex. Given that our design incorporates most of the positive factors that enhance translation (including a strong promoter with optimized introns, the endogenous CPN60C gene, and a validated terminator), we will not elaborate further.

Although it is extremely challenging for us to directly obtain an isolated "translation success rate" from the literature, we do find indirect evidence from previous studies to support our assignment of P-translation. As Kong et al. (2014) inferred, differences in SQS protein accumulation levels must be due to differences in transcription (rather than translation); codons in endogenous genes do not interfere with translation, making it easy to obtain high-expressing transformants (Kong et al., 2014). Schroda, in a 2019 review, also asserted that transcriptional silencing is the primary cause of transgenic silencing in Chlamydomonas (Schroda, 2019). By studying and summarizing these studies, we reasonably speculate that translation is not the primary obstacle to transgenic expression failure in Chlamydomonas reinhardtii. Assuming favorable design and experimental conditions, we can reasonably assign a higher range for translation: [0.6, 0.8].

Conclusion: P-translation (P5) [0.6, 0.8]

P6. Survival after Resistance Selection — P(survival)

For resistance selection, we employed the aphVIII/paromomycin cassette. The aphVIII gene is one of the most widely used selection markers in Chlamydomonas reinhardtii, providing stable resistance (Sizova et al., 2001). This marker has also been described as common and highly effective by Barahimipour, Neupert, and Bock (2016).

Paromomycin concentrations typically used for selection range from 10–40 μg/mL (Chlamydomonas Research Center), with 10 μg/mL being the most widely adopted by researchers (Nievergelt et al., 2023; Sizova et al., 2021), and also consistent with our experimental practice.

Similar to P-translation, and based on the findings of Schroda et al. (2002) and subsequent studies, it is reasonable to assume that P-survival falls within a relatively high and stable range. Once transcription and translation have been achieved, survival under antibiotic selection does not usually present a significant barrier, provided appropriate conditions and handling are ensured.

Therefore, we assign P-survival a range of:

P6∈[0.6,0.8]

Conclusion: For modeling purposes, P6 is defined within the range [0.6, 0.8].

In summary:

Estimated ProbabilityNotes
P-transformation (P1)[P1-, P1+]Estimated P1 * P2: [0.00025, 0.00034]
P-integration (P2)[P2-, P2+]Estimated P1 * P2: [0.00025, 0.00034]
P-promoter active (P3)[0.8, 0.9]
P-transcription (P4)[0.54, 0.74]
P-translation (P5)[0.6, 0.8]
P-survival (P6)[0.6, 0.8]
P-expressionP1 * P2 * P3 * P4 * P5 * P6

Here, P1- and P1+ represent the lower and upper bounds of P1, respectively, and the same notation applies to P2.

Python assigns random values during the Monte Carlo simulations within the assigned range of probability, subject to the constraints described in the notes. As a result, P1 and P2 carry a relatively high degree of randomness. While this randomness does not statistically influence the overall P-expression, it may interfere with sensitivity analyses in which P1 and P2 are treated as independent variables.

5. Modeling and Conclusion

With the assistance of ChatGPT 5.0, we developed a simple Python program that implemented the modeling framework and probability definitions described above. Using this program, we performed 10,000 Monte Carlo simulations.

The final results are presented below:

ItemData
Trials10000
Success rate1.0
95% CI (Wilson)[0.999616, 1.0]
Mean P_expression0.00007822
Std P_expression0.00001391
Mean N_end312.89
Std N_end55.64
Decision (lower CI ≥ 0.80)PASS

Namely, in our 10,000 Monte Carlo simulations, every run yielded at least 100 positive transformants. According to the previously defined criteria for “experimental success” and under a 95% confidence interval, the overall probability of success was estimated to be between 99.96% and 100%, which is substantially higher than the predefined threshold of 80%.

Conclusion: The transformation of the Chlamydomonas reinhardtii CC-124 strain with the designed plasmid, pKs-HSP70A-CPN60C-RBCS2t, would yield a sufficient number of colonies with detectable CPN60C expression by PCR.

Distribution of P-expression
Distribution of P-expression
Distribution of N-end
Distribution of N-end

From the following scatter plots, it can be seen that P12 has the most obvious slope, indicating that within the probability values we assigned, P-transformation and P-integration have a very significant impact on experimental success. However, this may also be due to the influence of P1 and P2 not being assigned separately, resulting in greater randomness. P4 (transcription) also shows a clear positive correlation.

crmc_scatter_params_vs_N_end
crmc_scatter_params_vs_N_end

From the sensitivity analysis using standardized regression coefficients (SRC) shown below, P4 (transcription) has the greatest influence among all parameters, followed by P12.

crmc_tornado_SRC
crmc_tornado_SRC

From these two figures, it can be observed that the influence of P3 (promoter active) is relatively small. This might be because, according to the literature, we have already assigned the promoter activity a high and relatively precise value range, leaving limited room for further optimization.

6. Limitations and Shortcomings

Despite our careful efforts in constructing the model—including extensive literature review, updating our statistical knowledge, and improving our programming skills—we are fully aware of several limitations and shortcomings:

Computational vs. molecular dynamics modeling: This is a computational model rather than a molecular dynamics model. Most of the data were derived from literature and our interpretation of previous studies, with only a small fraction based on our own experimental data. Inevitably, the model remains somewhat simplistic.

Subjective parameter definitions: Several P values were defined using assumptions informed by literature. While these assumptions are grounded in prior work, they remain subjective. Moreover, differences in experimental conditions, target genes, and measurement methods across the cited studies inevitably introduce deviations.

Gap between modeling and practice: Although the model suggests a near-100% success rate, in practice our laboratory results fall short of consistently maintaining N-end ≥ 100. A key reason is that many influencing factors were simplified or idealized in the model, whereas real-world transgenic processes in C. reinhardtii are affected by a wide range of internal and external variables. Experimental procedures are also rarely flawless, and the cells themselves are fragile, particularly after electroporation and antibiotic selection. Our lab experience—sometimes learned the hard way—has shown how difficult it can be to keep potential transformants alive through to mid-log phase.

Independence assumption of model steps: For simplicity, each step of plasmid transformation was treated as an independent event with its own probability. In reality, however, these steps are often coupled and interdependent, which also explains why reliable step-specific references are difficult to find. Our model does not capture these interactions.

Although far from a fully mature research model, this work represents a serious and diligent attempt to predict outcomes of plasmid transformation. We hope it provides a useful perspective for evaluating designed expression cassettes and offers inspiration—as well as cautionary lessons—for future teams interested in similar modeling approaches.

References

  • Baier, T., Jacobebbinghaus, N., Einhaus, A., Lauersen, K. J., & Kruse, O. (2020). Introns mediate post-transcriptional enhancement of nuclear gene expression in the green microalga Chlamydomonas reinhardtii. PLoS genetics, 16(7), e1008944.
  • Barahimipour, R., Neupert, J., & Bock, R. (2016). Efficient expression of nuclear transgenes in the green alga Chlamydomonas: synthesis of an HIV antigen and development of a new selectable marker. Plant molecular biology, 90(4), 403-418.
  • Cao, M., Fu, Y., Guo, Y., & Pan, J. (2009). Chlamydomonas (chlorophyceae) colony PCR. Protoplasma, 235(1), 107-110.
  • Chlamydomonas Resource Center. (n.d.). pAphVIII (pPH075). Retrieved Month Day, Year, from https://www.chlamycollection.org/product/paphviii-pph075/
  • iGEM Team TU Kaiserslautern. (2019). Contribution. In 2019 iGEM Competition. Retrieved from https://2019.igem.org/Team:TU_Kaiserslautern/Contribution
  • Kong, F., Yamasaki, T., & Ohama, T. (2014). Expression levels of domestic cDNA cassettes integrated in the nuclear genomes of various Chlamydomonas reinhardtii strains. Journal of bioscience and bioengineering, 117(5), 613-616.
  • Mackinder, L. C. (2018). The Chlamydomonas CO₂‐concentrating mechanism and its potential for engineering photosynthesis in plants. New Phytologist, 217(1), 54-61.
  • Schroda, M., Blöcker, D., & Beck, C. F. (2000). The HSP70A promoter as a tool for the improved expression of transgenes in Chlamydomonas. The Plant Journal, 21(2), 121-131.
  • Schroda, M., Beck, C. F., & Vallon, O. (2002). Sequence elements within an HSP70 promoter counteract transcriptional transgene silencing in Chlamydomonas. The Plant Journal, 31(4), 445-455.
  • Schroda, M. (2019). Good news for nuclear transgene expression in Chlamydomonas. Cells, 8(12), 1534.
  • Schroda, M., & Remacle, C. (2022). Molecular advancements establishing Chlamydomonas as a host for biotechnological exploitation. Frontiers in Plant Science, 13, 911483.
  • Shahar, N., Landman, S., Weiner, I., Elman, T., Dafni, E., Feldman, Y., ... & Yacoby, I. (2020). The integration of multiple nuclear-encoded transgenes in the green alga Chlamydomonas reinhardtii results in higher transcription levels. Frontiers in Plant Science, 10, 1784.
  • Sizova, I., Fuhrmann, M., & Hegemann, P. (2001). A Streptomyces rimosus aphVIII gene coding for a new type phosphotransferase provides stable antibiotic resistance to Chlamydomonas reinhardtii. Gene, 277(1-2), 221-229.
  • Sizova, I., Kelterborn, S., Verbenko, V., Kateriya, S., & Hegemann, P. (2021). Chlamydomonas POLQ is necessary for CRISPR/Cas9-mediated gene targeting. G3, 11(7), jkab114.
  • Wan, M., Tan, R., Wang, Z., Wang, W., Yu, T., & Li, Y. (2025). Electroporation optimization for cell wall-deficient and cell-walled Chlamydomonas reinhardtii using response surface methodology. Algal Research, 104167.
  • Wang, L., Yang, L., Wen, X., Chen, Z., Liang, Q., Li, J., & Wang, W. (2019). Rapid and high efficiency transformation of Chlamydomonas reinhardtii by square-wave electroporation. Bioscience reports, 39(1), BSR20181210.
  • Yamano, T., Iguchi, H., & Fukuzawa, H. (2013). Rapid transformation of Chlamydomonas reinhardtii without cell-wall removal. Journal of bioscience and bioengineering, 115(6), 691-694.