Skip to content

Measurement

On this page, we document how we quantify our experimental results.

📑Contents
No headers found on this page

Overview ​

Our project DR.sTraTeGY is trying to dynamically record natural mutations in living cells using engineered biological systems. The core of this strategy is a mutation recorder based on the EMSfp parts, which converts random mutations into quantifiable changes in fluorescence intensity. To validate this recorder's functionality, we employed a comprehensive three-stage approach: fluorescence microscopy for direct visual confirmation of intensity variation at single-cell level, flow cytometry for statistical analysis of population-wide fluorescence distribution, and deep sequencing to confirm mutation localization in promoter regions but not EMSfp regions.

Microscopy - Qualitative Observation of Signal Variation ​

To visually record and quantify gene mutations under evolutionary pressure in yeast, we specifically designed the TU Recorders. This novel biological tool dynamically tracks natural mutations in living cells, particularly in yeast, by converting random genetic mutations into quantifiable changes in fluorescence intensity. This is achieved by combining a mutation-sensitive promoter with an EMS (ethyl methanesulfonate)-resistant fluorescent protein. To qualitatively assess the performance of our 28 TU Recorders combinations and narrow down the most promising candidate, we employed fluorescence microscopy, visually observing changes in fluorescence intensity and distribution within individual yeast cells before and after EMS treatment.

Our screening focused on two key aspects:

  • Primary Channel Reactivity

    We looked for a noticeable change in fluorescence intensity and brightness distribution within the primary fluorescent channel (the channel corresponding to the specific EMSfp used in the recorder) after EMS treatment. A good recorder would show a more heterogeneous intensity distribution, potentially with a subset of cells becoming noticeably brighter or dimmer, indicating a mutation-driven change in promoter activity.

  • EMS Resistance

    Simultaneously, we carefully monitored the fluorescence intensity in the other three fluorescent channels. Ideally, these non-primary channels should exhibit minimal to no change in intensity or brightness distribution. This observation would confirm that the EMSfp sequence itself is resistant to mutagenesis, and that the observed changes are specific to the promoter's response to EMS, rather than a general degradation or alteration of the fluorescent protein.



Figure 1. Different promoter-fluorescent protein pairs exhibited distinct fluorescence intensities across channels after EMS treatment.
(A) pOST1-EMSfp499. (B) pRNR2-EMSfp399. (C) pRNR2-EMSfp499. (D) pRNR2-EMSfp569. (E) pRNR2-EMSfp643. (F) pSTM1-EMSfp569. (G) pTDH3 EMSfp569. (H) pSTM1-EMSfp499. Pre-EMS treatment, pSTM1-EMSfp499 exhibited higher green channel fluorescence intensity compared to other channels. Post-EMS treatment, the green fluorescence intensity and brightness distribution became more heterogeneous, with a subset of cells appearing noticeably brighter.


Based on initial observations through fluorescence microscopy, we identified some promising combinations that qualitatively met these criteria, showing clear changes in their primary fluorescence channel while maintaining stability in other channels, one of them is BBa_255T0PHY pSTM1 driven EMSfp499. This qualitative selection gave us confidence to continue quantitative validation using flow cytometry and deep sequencing.

Flow Cytometry - Data Processing and Composite Score Calculating ​

The flow cytometry data processing pipeline was designed to ensure signal fidelity, correct for autofluorescence, and provide statistically robust metrics for quantifying the effect of EMS induction on fluorescent protein expression. This process is divided into three critical stages: (1) Quality Control and Data Normalization, (2) Fold Change Calculation and Significance Test, and (3) Composite Score Calculation.

Quality Control and Data Normalization ​

Following initial gating to isolate single-cell populations (for experimental details, please refer to our Experiments page), a rigorous, batch-specific quality control (QC) filter was applied to distinguish true positive fluorescence from background noise and to normalize data.

The non-fluorescent control strain, BY4741, was used to establish the noise threshold and generate corrected fluorescent intensity. Only single-cell events registering a fluorescence intensity above the BY4741 median in the designated channel were retained for downstream analysis, otherwise they were considered non-expressing or indistinguishable from background, and were thus discarded. The effectiveness of this filtration was monitored by calculating the retained event ratio (retained signal count / total event count), which served as the key sample-specific quality control metric (see supplemental table in gitlab folder).

Note that to mitigate batch effects, BY4741 control was synchronously treated alongside every batch of experimental samples.

Fold Change Calculation and Significance Test ​

Because raw cellular fluorescence data exhibit an exponential, highly skewed distribution, we employed logarithmic transformation, which is widely adopted transformation that effectively stabilizes the variance and converts the skewed distribution into an approximately normal distribution for statistical validation.[1]

While the t-test on log-transformed data establishes significance, the magnitude of the fluorescent change was quantified using the medium intensity, instead of mean intensity, of the corrected data. This transition is because the median is a non-parametric measure of central tendency that is less sensitive to extreme outliers or subtle shifts in population shape than the mean.[1:1] This mixed approach—using log data for statistical confidence (P-Value) and raw median for quantification (FC) —is a key strategy to maximize both the statistical validity and the biological utility of the final metrics.

Figure 2. Different promoter-fluorescent protein pairs exhibited different fluorescence intensity change pattern after EMS-treatment

Composite Score Calculation ​

This mean-squared function severely penalizes any substantial, non-specific signal change, regardless of whether that change is an increase or a decrease, thereby isolating stable reporting systems.

  • Effectiveness (E)

The ∣log2Fold Change∣ term measures the magnitude of the expression change in the primary channel—the macro-level effect of the promoter mutation, while the statistical significance term −lg(PValue) ensures that only changes that are highly improbable to be due to random noise are rewarded. This filters out unreliable or unstable expression changes.

  • Composite Score (S)

The Composite Score (S) synthesizes these two orthogonal performance dimensions (E and Sloss) into a single weighted objective function:

We set a high weight on Effectiveness (WE = 10.0) and a lower weight on Specificity Loss (WS = 1.0), for the model explicitly prioritizes successful mutational outcomes (Effectiveness) but simultaneously enforces a necessary penalty for any system instability (Specificity Loss).

Following a comprehensive performance analysis of all promoter and fluorescent protein combinations (our raw data at DOI: 10.5281/zenodo.17293146), we selected the three optimal pairs -- BBa_25FQWVZE pRNR2 driven EMSfp383, BBa_255T0PHY pSTM1 driven EMSfp499, and BBa_25PHHOV9 pTDH3 driven EMSfp383.

Table 1. Performance of Individual Promoter

PromoterAvg Composite Score (S)Avg Effectiveness (E)Avg Specificity Loss (S_loss)Avg log2(FC)
pSTM11036.3509103.73230.97170.4013
pOST1925.246292.55750.32920.5683
pRNR2833.179883.34890.30970.4926
pTDH3643.861664.50871.22550.0766

Table 2. Performance of Individual Fluorescent Protein

Fluorescent ProteinAvg Composite Score (S)Avg Effectiveness (E)Avg Specificity Loss (S_loss)Avg log2(FC)
EMSfp3832052.6009205.30970.49580.6844
EMSfp3991294.4916129.50280.53630.4317
EMSfp642867.711886.99422.23020.2677
EMSfp499708.266570.85910.32480.327
EMSfp643632.289463.24490.15960.5662
EMSfp569316.866131.7760.89410.1562
EMSfp50619.89332.00860.1927-0.1271

Table 3. Performance of Combination of Different Promoter and Fluorescent Protein

PromoterFluorescent ProteinComposite Score (S)Effectiveness (E)Specificity Loss (S_loss)log2 (FC)
pRNR2EMSfp3832289.7735228.99910.21770.7633
pSTM1EMSfp4991847.6624184.8040.3780.616
pTDH3EMSfp3831815.4284181.62020.77390.6054
pOST1EMSfp3991712.5224171.26020.07950.5709
pOST1EMSfp6421644.489164.56441.15470.6371
pSTM1EMSfp6431155.6685115.5740.07180.4703
pRNR2EMSfp3991144.0798114.43090.22910.3814
pRNR2EMSfp5691122.074112.24240.35010.3741
pTDH3EMSfp3991026.8727102.81731.30020.3427
pTDH3EMSfp642800.938480.64095.471-0.2755
pOST1EMSfp643788.750978.88510.10051.0745
pTDH3EMSfp499534.409253.44540.04430.1782
pOST1EMSfp499440.258544.03770.11860.4455
pTDH3EMSfp643310.030731.02590.2284-0.2071
pRNR2EMSfp643274.707527.49450.23780.9272
pRNR2EMSfp642157.708115.77730.06490.4414
pSTM1EMSfp569105.72210.81872.46540.1177
pOST1EMSfp56940.21014.04030.19260.1135
pTDH3EMSfp50619.89332.00860.1927-0.1271
pRNR2EMSfp49910.73581.14940.75830.0683
pTDH3EMSfp569-0.54160.00270.56830.0194

Growth Curve - Quantitative Assessment of Metabolic Burden ​

To evaluate the metabolic burden imposed by the top three fluorescent reporters, we quantified and compared their growth rates by recording their hourly growth curves via optical density (OD) measurements. Although the average size of yeast is about 5-10 μm, we only have NanoCym950 nanoparticles with a diameter of 950 nm. We estimated that 1 OD600 corresponds to 10^8 nanoparticles per mL, which was used to convert yeast counts. Experimental details please refer to our protocol.

The experimental growth data were fitted to the Logistic Model to quantify key kinetic parameters, including the maximum population density and the specific growth rate, allowing a quantitative comparison of strain performance. It was performed by fitting the raw data to the Self-Starting Logistic Model (SSlogis) using the nls function in R.

Logistic Model:

  • Asym: asymptote, representing the upper horizontal limit that the curve approaches as the independent variable (Time) increases towards infinity. For growth curves, it is the maximum cell density of the environment.
  • xmid: inflection point time, representing the value of the independent variable (Time) at which the curve reaches its midpoint. At this point, the value of y is Asym/2. For growth curves, it is the time point when the growth rate is maximal.
  • scal: scale parameter, defining the spread or slope of the curve. For growth curves, it is inversely related to the growth rate (r). A smaller scal value means a steeper slope and a faster growth rate.
Figure 3. Self-Starting Logistic Model Fitted Parameters

Table 4. Self-Starting Logistic Model Fitted Parameters

GroupAsym (x 10^8 / mL)xmid (Time of Inflection)scal (Growth Rate)R-squared
BY47413.675.761.36490.9953
pSTM1-EMSfp4993.636.251.45360.9947
pTDH3-EMSfp3834.078.822.02270.9924
pRNR2-EMSfp3835.7416.233.07890.9874

According to analysis, pSTM1-EMSfp499 demonstrated a growth pattern most similar to the wild-type BY4741 strain, with pTDH3-EMSfp383 following closely (Figure 3 & Table 4). While the pRNR2-EMSfp383 combination achieved the highest Composite Score (S) in flow cytometry, it imposed a significant metabolic burden on the yeast, rendering it unsuitable as an ideal fluorescent reporter. By synthesizing the fluorescent change patterns with the metabolic burden profiles, we concluded that BBa_255T0PHY pSTM1 driven EMSfp499 is the optimal reporter combination for our Recorder module.

Deep Sequencing - Molecular Validation of the Mechanism ​

To further validate that the EMS Sequence Optimizer-optimized fluorescent protein exhibits high resistance to EMS mutagenesis, we performed deep sequencing (third-generation Nanopore sequencing) on select gene sequences.

Using the pre-EMS-induction sequence as the reference, we employed the NanoPlot tool to align the Nanopore reads to the reference/target sequence. We then generated a pileup output to calculate the base counts and percentages at each position. Supplemental data is available in gitlab folder.

A site was designated as a genuine mutation—rather than a sequencing error—if its matching rate fell below 95% relative to the reference base. This 95% threshold was established based on the reported ∼5% error rate of Nanopore sequencing itself. The potential contribution of mutations arising from the high-fidelity Phanta PCR amplification was deemed negligible, as its mutation rate (∼10−5 divided by 128 for Phanta Max fidelity) is several orders of magnitude lower than the Nanopore error rate.

Figure 4. EMS induced mutation rate in different regions

By separately quantifying the putative EMS-induced mutations (G/C ↔ A/T) within the promoter, coding sequence (CDS or EMSfp), and terminator regions, we calculated the respective mutation rates. The results showed that the EMS mutation rate in the promoter region was significantly higher than that in the CDS/EMSfp region(one-way ANOVA and followed with Tukey's multiple comparisons test, p < 0.001). Our analysis confirms that the EMSfp sequence indeed confers resistance to EMS-induced mutagenesis.

Conclusion ​

The integrated results demonstrate that EMS-induced mutations specifically accumulate in the promoter region rather than the coding sequence, directly linking observed fluorescence changes to targeted genetic alterations. Through this systematic validation spanning cellular, population, and molecular levels, we have established BBa_255T0PHY pSTM1 driven EMSfp499 in our TU Recorders collection as a reliable standardized biological part that effectively records mutation events, thereby enabling dynamic tracking by our DR.sTraTeGY.

Reference ​


  1. Hodgins-Davis, A., Duveau, F., Walker, E. A., & Wittkopp, P. J. (2019). Empirical measures of mutational effects define neutral models of regulatory evolution in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences of the United States of America, 116(42), 21085–21093. DOI: 10.1073/pnas.1902823116 ↩︎ ↩︎