OVERVIEW
We established two models: one developed with the Wet Lab team to simulate experimental processes, and another with the Human Practices team to quantitatively assess ethical risks in our project. For experimental simulation, we proposed the DISCERN model (Dynamic Integration of Simulation for Cancer Evaluation and Recognition), which integrates RADAR system design and physical simulation. Using machine learning, we identified key gene combinations and designed ADAR-based programmable RNA sensors, further optimized by structural prediction algorithms. Molecular dynamics simulations were then applied to capture cell behavior in realistic micro-environments, laying the groundwork for future in vivo applications. For ethical evaluation, we proposed the BEAM model (Bio-Ethical Assessment Model). It transforms qualitative ethical concerns into a quantitative, data-driven framework for synthetic biology. It establishes evaluation dimensions, applies Structural Equation Modeling (SEM) to assign weights, and converts them into Bayesian probabilistic variables for predictive risk assessment. This integration bridges theoretical reasoning with computational analysis, offering a systematic and innovative method to quantify ethical risks. Paired with its companion software BEAMer, the model provides a transparent, rigorous, and practical tool for ethical evaluation.
DISCERN Dynamic Integration of Simulation for Cancer Evaluation and Recognition Model
BEAM Bio-Ethical Assessment Model
Abstract of DISCERN This project aims to build a live cell monitoring system based on a modular, programmable system for live RNA sensing using adenosine deaminases acting on RNA (RADAR) for early identification of breast cancer. We used adipocytes as host cells and engineered them through synthetic biology methods to make them able to respond to molecular signals in the breast cancer microenvironment.The RADAR system is a sensor system that initiates downstream reporter gene expression by designing RNA sequences to form specific structures,which recruits ADAR proteins to trigger RNA editing events. To build an ideal RADAR system for surveillance of breast cancer cells, our model addresses the following tasks:
Identifying target genes that distinguish cancer-associated adipocytes(CAA) from normal adipocytes.
Designing sensor-RNA sequence of the target RNA.
Simulating reactions of the RADAR system.
Simulating the filtering function of the glomerular filtration barrier in kidney.
To address task 1, we first selected four candidate genes associated with the development of breast cancer through a literature review: PLOD2, FAM3C, LIF, and IL6. These genes have been reported to be closely related to tumor microenvironment regulation, immune reaction, and cell migration in several studies, and have the potential to be markers of CAA. Subsequently, we constructed a classification model based on MLP architecture and improved by the Attention mechanism, modeled to evaluate the different combinations of these four genes, and selected the gene combination with the best prediction ability based on multiple classification performance indicators (such as AUC and F1-score), providing a theoretical basis for the design of downstream sensors. To address task 2, we designed RNA sensors targeting the mRNA transcripted from these above genes based on the RADAR system. By constructing RNA elements with specific functional structures, combined with secondary structure prediction, GC content analysis, and base exposure evaluation, we optimized the structural stability and editing efficiency of the sensor, improving its ability to target mRNA and ADAR-mediated editing response. Our designed RNA sequence has good structural characteristics and target specificity, making it suitable for real-time sensing of breast cancer signals in breast adipocytes. To address task 3, we performed dynamic modeling to simulate reporter gene expression reactions in RADAR system, gaining the predictive time and concentration curves of the reactions. In addition, we used 3D curves to simulate the reactions at different concentrations of PLOD2 and LIF expression products and estimated the appropriate concentrations for the RADAR system required to detect different concentrations of target mRNA. To address task 4, we performed multi-scale modeling to simulate the selective filtration process of protein through glomerular filtration barrier (GFB) to obtain the predicted proportion and time curve of protein transmembrane passage. In addition, we used this model to analyze the dynamic behavior of Gaussia luciferase (Gluc, 19.9 kDa) in the four-layer filtration system, and estimated its sieving coefficient (SC) and the time window to reach the urine detection threshold (Detection Latency) under physiological conditions. Monitor Gene Combination Selection This project aims to construct a engineered adipocyte system that can actively monitor the occurrence of breast cancer. To achieve this goal, we need to identify a set of characteristic gene combinations that can specifically identify cancer-associated adipocytes(CAA). The combination should not only have high classification and discrimination ability, but also have good explainability, for use in constructing perception modules. By integrating big data analysis and deep learning algorithm (Attention mechanism), we systematically explored the differences of gene expression between cancer-associated adipocytes and normal adipocytes and constructed a high-performance evaluation framework to provide bioinformatics support for the project. Methodology Overview Our modeling philosophy is rooted in a core principle: a robust biological model must not only achieve outstanding predictive accuracy but also possess profound biological interpretability. This is because our ultimate goal is to construct a perception module that is understandable, reliable, and, ultimately, safe for in-vivo applications. Models that rely solely on complex "black-box" algorithms, while potentially yielding high performance on specific datasets, present a challenge due to their opaque decision-making processes, making them difficult to translate into reliable biological designs. Therefore, the models and methodologies we have chosen seek an optimal balance between performance and interpretability. This guiding philosophy underpins the entire strategy we outline below. In order to accurately identify breast cancer-induced up-regulated genes in adipocytes, we first selected genes that have clear evidence of being directly regulated by factors secreted from breast cancer cells through systematic literature research, and then evaluated their discriminatory performances as diagnostic markers for CAA through the prediction model Attention-MLP based on the database, and filter out subsets with the most predictive value. This strategy generally consists of the following stages:
1. Identifying the marker genes up-regulated by factors secreted from breast cancer cells in adipocytes through literature search.
Attention-MLP modeling.
Selecting the optimal combination through the Attention-MLP model.
The data used in this research mainly comes from three parts:
TCGA (The Cancer Genome Atlas Program) It collects gene expression sequencing data from the marginal tissue of the primary tumor tissue from multiple breast cancer patients.
GTEx (Genotype-Tissue Expression Project) It provides gene expression sequencing data from subcutaneous adipose tissue of healthy individuals, which can be used as normal controls.
UCI (UC Irvine Machine Learning Repository) It provides multiple datasets for the evaluation of model performance.
Selecting Genes to Be Used Before embarking on the gene selection process, we first grappled with a fundamental question: How could we ensure that our chosen biomarkers were not only statistically significant but also genuinely reflected the specific pathology of breast cancer, rather than a general inflammatory response? This critical consideration stemmed from our insightful discussion with Sir Walter Bodmer, a Fellow of the Royal Society. He cautioned us that many "highly expressed" genes observed in cancer-associated adipocytes (CAA) might merely indicate a common inflammatory state rather than being specific to breast cancer. This profound insight prompted a pivotal shift in our screening strategy. We decided to move away from relying solely on large-scale screening of a single database. Instead, we established a "validation-first, screening-second" principle. This meant we would first conduct a rigorous and extensive literature review to verify the specific association of any candidate genes with breast cancer. Only after confirming this link through solid evidence would we proceed to the database screening and validation phase. This foundational principle ensured the biological basis of our model was both robust and rigorously vetted. To identify candidate genes, we first conducted literature mining in databases such as PubMed and GEO. We conducted a literature aggregation analysis and selected studies published in high impact factor journals (IF>5) in the past ten years to extract adipocyte marker genes that are closely related to the occurrence and development of breast cancer[1-6], as is shown in Table 1.
Target genes in selected literatures
Tumor Secretes Cytokines Action Pathway Target Gene References
PAI-1 PI3K-AKT-FOXP1 PLOD2 Tumor-secreted PAI-1 promotes breast cancer metastasis via the induction of adipocyte-derived collagen remodeling.
CXCL3 STAT3/NF-kB LIF Cancer-associated adipocytes promote the invasion and metastasis in breast cancer through LIF/CXCLs positive feedback loop.
TGF-\(\beta\) Smad2 FAM3C FAM3C in Cancer-Associated Adipocytes Promotes Breast Cancer Cell Survival and Metastasis.
IL1-\(\beta\) MAPK/NF-kB IL6 Hofwimmer K, de Paula Souza J, Subramanian N, Vujičić M, Rachid L, Méreau H, Zhao C, Dror E, Barreby E, Björkström NK, Wernstedt Asterholm I, Böni-Schnetzler M, Meier DT, Donath MY, Laurencikiene J. IL-1\(\beta\) promotes adipogenesis by directly targeting adipocyte precursors. Nat Commun. 2024 Aug 21; 15: 7957.

Flower L, Gray R, Pinkney J, Mohamed-Ali V. Stimulation of interleukin-6 release by interleukin-1\(\beta\) from isolated human adipocytes. Cytokine. 2003 Jan; 21(1): 32–37.

Kim HS, Jung M, Choi SK, Woo J, Piao YJ, Hwang EH, Kim H, Kim SJ, Moon WK. IL-6-mediated cross-talk between human preadipocytes and ductal carcinoma in situ in breast cancer progression. J Exp Clin Cancer Res. 2018 Sep 17; 37: 200.
Attention-MLP Multi-Layer Perceptron (MLP) is a classic type of artificial neural network structure that consists of multiple fully connected layers stacked in sequence. In biomedical data analysis, MLP have the advantages of simple structure and strong generalization ability. However,in the procession of high-dimensional gene expression data, it is difficult for MLP to show which genes are more important in decision-making[7]. Originally proposed in the field of natural language processing, the Attention Mechanism aims to allow models to automatically focus on the most relevant parts when processing information[8]. In the analysis of genomics, the introduction of the Attention Mechanism means that models can automatically assign weights to different genes, highlighting genes that are more important for prediction results and reducing interference with redundant features[9]. Through literature review, we found that multi-layer perceptron(MLP) has been widely used in classification and prediction tasks[10-13] and has shown high maturity and stability in the field of medical research [14-17]. However, MLP still has certain limitations in processing medical data, such as the lack of explicit modeling capabilities for the importance of different features, making it difficult to fully capture the differentiated contributions between features[7]. To overcome this shortcoming, we introduced an Attention mechanism into the model, which can dynamically assign weights according to the role of features in prediction, so as to improve the model's ability to capture key information and prediction performance while maintaining the advantages of MLP nonlinear modeling. Based on this idea, we constructed an MLP model with Attention Mechanism (Attention-MLP) to search for gene combination with the optimal monitoring performance in the transcriptome data of paracancerous tissue and normal samples. We first train and validate the model on gene subsets at different scales to evaluate the performance of different combinations in classification tasks, and add attention scores to the calculation process of the model to finally obtain a set of outstanding candidate gene combinations. Model Structure Our model is composed of these following parts:
Input Layer Inputting the polygene expression vector from the candidate combination.
Attention Module Introducing attention scores to dynamically weight the characteristics of different genes in the current combination.
MLP Carrying out nonlinear modeling and classification learning on the weighted expression vectors.
Output Layer Using sigmoid activation function to output the probability value.
The model structure is shown in Figure 1:
Attention-MLP Model Structure
The input data is a gene expression matrix, with each row being a sample and each column being the expression level of one gene, which is standardized before being fed into the model. Each sample corresponds to a label 0/1, 0 stands for normal sample, and 1 stands for paracancerous sample. The original gene expression data matrix \(x\) may contain thousands of gene dimensions, which is computationally intensive and easy to overfit if used directly, so we used an encoder to map it to a more compact hidden feature space: \[h=\mathrm{ReLU}\left(W_ex+b_e\right).\] Among them, model parameters \(W_e\) and \(b_e\) are weight matrix and bias, respectively, and \(\mathrm{ReLU}\) is the activation function, which sets the values less than 0 in the input to 0 and the values greater than 0 remain unchanged, increasing the nonlinearity of the result. The final \(h\) is the feature matrix of the hidden layer. Next, we first performed a linear transformation to the hidden layer feature matrix \(h\), and mapped the input to the \(\left[−1,1\right]\) range through the \(\tanh\) activation function, which is central symmetrical to 0 to ensure the numerical stability and gradient smoothing. The processed feature matrix \(u\) is obtained. The definition of Tanh activation function is as follows: \[\tanh\left(x\right)=\dfrac{e^x-e^{-x}}{e^x+e^{-x}}.\] We then mapped the processed feature matrix \(u\) back to the same dimension as the original input to obtain the unnormalized score \(z\) for each gene. After we got the score vector \(z\), we used the \(\mathrm{Softmax}\) function to convert it into an attention score \(\alpha\). Softmax turns all scores into values between 0 and 1 and guarantees that their sum is 1: \[\alpha_i=\dfrac{e^{z_i}}{\displaystyle\sum_{k=1}^de^{z_k}}.\] With an attention score of \(\alpha\), we can weight the original input \(x\), highlighting important genes and weakening unimportant genes: \[x_{\text{attended}}=\alpha\odot x,\] \(\odot\) stands for the Hadamard product, which means multiplying corresponding elements. The \(x_{\text{attended}}\) obtained is the attention-weighted gene expression vector. Then we fed this weighted vector into the classifier and output a score \(y\). To get the final prediction probability, we use the Sigmoid function to compress the score between 0 and 1: \[\hat{y}=\dfrac{1}{1+e^{-y}},\] \(\hat{y}\) is the final output of the model, which can be directly interpreted as the "probability of canceration", with values ranging from 0 to 1. In order to translate this continuous probability into the classification results "cancerous" or "healthy", we needed to set a classification threshold. When the probability is higher than this threshold, the sample is judged to be cancerous; otherwise, it is judged to be healthy. The commonly used index for determining thresholds is the Youden index[18], which is defined as: \[J=\arg\displaystyle\max_{t}\left\{\mathrm{Sensitivity}\left(t\right)+\mathrm{Specificity}\left(t\right)-1\right\}.\] Among them, \(t\) stands for "classification threshold", which is a probability boundary between 0 and 1. Sensitivity stands for the probability of a true positive, and Specificity stands for the probability of a true negative. The critical threshold \(t^*\) when the maximum value is reached is called the "optimal threshold". If \(\hat{y}\) is less than or equal to (greater than) \(t^*\), it is considered healthy (cancerous). \(J\) and \(t^*\) are also frequently found in the applied biomedical literature[19-22]. The quality of the model is measured by the loss function. Here we used binary cross-entropy (BCE) loss: \[L=-\dfrac{1}{N}\displaystyle\sum_{n=1}^N\left[y^{\left(n\right)}\log\hat{y}^{\left(n\right)}+\left(1-y^{\left(n\right)}\right)\log\left(1-\hat{y}^{\left(n\right)}\right)\right].\] Among them, \(y^{\left(n\right)}\) is the true label and \(\hat{y}^{\left(n\right)}\) is the prediction probability, and the closer the prediction is to the true value, the smaller the loss. During the training process, we used backpropagation to adjust all parameters \(\theta\) (\(W\), \(b\), etc.): \[\theta\leftarrow\theta-\eta\cdot\nabla_\theta L.\] Among them, \(\eta\) is the learning rate, and \(\nabla_\theta L\) is the gradient of the loss function to the parameters, and through the continuous cycle of "forward calculation - loss calculation - backpropagation - parameters update", the model will gradually learn how to assign attention scores, so as to make more accurate predictions based on the original gene expression. Performance Evaluation In order to verify the prediction performance of the Attention-MLP model, we selected the classic logistic regression prediction model as the benchmark, compared it with several good prediction models (SVM, GBDT, etc.), and conducted systematic comparison experiments on several typical UCI public datasets (MAGIC gamma telescope data 2004, Spambase, Gisette, respectively have dozens, hundreds, and thousands of feature numbers).
Comparison of F1 scores between Attention-MLP and traditional models across different datasets
Dataset Model Spambase Gisette MAGIC gamma telescope data 2004
GBDT 0.934 0.967 0.902
SVM 0.510 0.976 0.818
MLP 0.915 0.966 0.871
Attention_MLP 0.938 0.978 0.888
Logistic Regression 0.902 0.975 0.848
The results show that our model achieves high F1-score in all datasets, and significantly outperforms the logistic regression prediction model as the baseline model, especially in datasets with high dimensions and complex feature relationships, like Gisette.
Model Performance on the Gisette Dataset
Model AUC ACC R F1
SVM 0.995 0.976 0.976 0.976
Logistic Regression 0.995 0.974 0.973 0.975
GBDT 0.993 0.966 0.966 0.967
Attention_MLP 0.996 0.976 0.978 0.978
MLP 0.986 0.965 0.969 0.966
It is worth noting that compared with traditional MLP algorithm, Attention-MLP has significant improvement in the ability of identification (as is shown in Table 3). This further verifies the importance of the Attention Mechanism in improving the model's ability of key feature dependency modeling. By introducing Attention mechanism, our model has the following advantages:
Adaptive Learning of Feature Importance The attention module can dynamically give higher weights to high-impact genes, improving the prediction performance of the model.
Enhanced Generalization Ability Redundant information is suppressed by weighting to reduce the risk of overfitting caused by high-dimensional feature space and improve the robustness of the model under small samples.
Strong interpretability Each input gene combination corresponds to a quantifiable Attention score, which provides the possibility to traceback their biological significance, and is beneficial for subsequent pathway analysis and plasmid design.
Determine the Optimal Combination Through systematic literature search, we found that FAM3C, PLOD2, LIF, and IL6 are four prominent candidate marker genes for identifying adipocytes affected by cancer cells[1-6]. We obtained the expression data of breast cancer paracancerous tissue samples from the TCGA database and data of normal breast tissue samples from the GTEx database, standardized the data after unified preprocessing, and divided the dataset into training set, validation set and test set according to a ratio of 7:1:2. We trained our model using the partitioned dataset, obtained the optimal model parameters through the validation set, and evaluated the prediction performances of all combinations of FAM3C, PLOD2, LIF and IL6 on the test set, using AUC as the evaluation index. The assessment results are shown in Table 4:
Assessment results of each gene combination
Combination AUC Combination AUC Combination AUC
PLOD2+LIF 0.894653465 FAM3C+PLOD2+LIF 0.887920792 FAM3C+LIF+IL6 0.798811881
FAM3C+PLOD2+LIF+IL6 0.797623762 FAM3C+PLOD2+IL6 0.771089109 FAM3C+LIF 0.764356436
PLOD2+IL6 0.733069307 FAM3C+PLOD2 0.607524752 FAM3C+IL6 0.543762376
IL6 0.470693069 PLOD2+LIF+IL6 0.429306931 FAM3C 0.425346535
LIF+IL6 0.339009901 PLOD2 0.321188119 LIF 0.236237624
The ROC curve is shown in Figure 2:
ROC Curve of the Top 5 Gene Combinations
The assessment results show that result of the combined ROC curve of PLOD2 and LIF (AUC=0.89) is the best among all combinations, and outperforms the ROC curves of LIF alone (AUC=0.24) and PLOD2 alone (AUC=0.68). This suggests that using PLOD2 and LIF as monitor genes together rather than using them alone allows for both sensitivity and specificity of prediction. Therefore, we considered PLOD2+LIF to be the best combination. The confusion matrix shows good classification stability, low false positive rate and low false negative rate, so the combination can be used as target genes for identifications of CAA. Designing Sensor Sequences We used RADAR system that regulates translation of reporter genes through ADAR editing to achieve responses to the expression of target genes[23]. Therefore, we needed to construct a synthetic RNA that can detect LIF and PLOD2 expression and achieve signal activation through RNA editing as the core detection module of our adipocyte engineering system. As is shown Figure 3, the sensor mRNA forms a chain structure with the core functional region being the sensor region. In the absence of interaction with the target RNA, the sensor region contains UAG stop codon, preventing them from initiating the translation of downstream reporter genes. We set the sensor region as two individual modules(sensor1 and sensor2), targeting the mRNA of LIF and PLOD2 respectively. Each sensor region contains one special UAG site. The binding of the target RNA to the sensor mRNA forms a double-stranded RNA (dsRNA),which recruits ADAR enzymes to edit the A nucleotide to G-like I nucleotide in the UAG site, resulting in change of UAG stop codon and initiations of the expression of downstream reporter genes.
RADAR system principles
That is, in the state of target RNA missing or low expression, the UAG in the sensor region blocks the initiation of translation; When the target RNA is expressed and binds to the sensor to form dsRNA, the ADAR edits UAG → UIG (which is identified as UGG), and thus enables translation. To obtain sensor mRNA with the best function, we searched the rnasensing.bio database for candidate sensor sequences that matches LIF and PLOD2 respectively. We filtered candidate sensors using the following criteria:
Containing legal editing sites (UAG)
Good pairing ability and predictive structure stability
High database scores
Finally, we selected candidate sensor sequences for each target gene and respectively named them as:
L1, L2, L3 candidate sensors for LIF
P1, P2 candidate sensors for PLOD2
By full combination arrangement, we constructed 12 RNA sequences(recorded as Seq A to Seq I), with each sequence consistent with the upstream regulatory regions and the downstream translation enzymes, differing only in the sensor regions. The combination is shown as follows:
sensor region combination and corresponding sequence number
Sequence Sensor region Sequence Sensor region Sequence Sensor region
Seq A P1+L1 Seq B P1+L2 Seq C P1+L3
Seq D P2+L1 Seq E P2+L2 Seq F P2+L3
Seq G L1+P1 Seq H L1+P2 Seq I L2+P1
Seq J L2+P2 Seq K L3+P1 Seq L L3+P2
Prediction of Secondary Structure of Long Stranded RNA The function of synthetic RNA not only depends on its sequence information but also highly depends on its secondary structure after folding. In the RADAR system, the secondary structure of RNA will directly influence the following key processes:
Whether the sensor region is exposed and whether it can be recognized by the target RNA and form dsRNA.
Whether the UAG editing site is in an accessible region and whether it can be recognized and successfully edited by the ADAR enzyme.
Limitations of Traditional Methods Traditional RNA secondary structure prediction algorithms (such as RNAfold, Mfold, NUPACK) perform well when processing short sequences or nested structures. However, due to high memory and calculative complexity, the prediction efficiency of long RNA (>1000 nt) is low, making it difficult to be applied to constructing complete sequences[24-26]. In this project, we need to evaluate a completely designed RNA containing sensors and upstream and downstream enzyme elements, which are usually much longer than conventional RNA fragments and may contain complex pseudoknots and multiple functional regions. Therefore, we need an algorithm that is more suitable for long chains and meets the requirements of functional structures. iPKnot++ For this purpose, we chose iPKnot++ as the prediction tool. This method combines the machine learning statistical model provided by CONTRAfold with the thermodynamic optimization strategy of ViennaFold to perform linear time calculation by using the LinearPartition model, and automatically select the optimal threshold parameters according to the pseudo-expected accuracy, and can identify complex pseudoknot structures, which is especially suitable for modeling and analyzing synthetic RNAs with long lengths and complex structures[27]. The model structure design of iPKnot++ is based on the core assumption that RNA secondary structures have strong local features, that is, most base pairs can be captured within a certain range, and although the pseudoknot structure spans a longer interval, it is mostly concentrated among a few cross-regional substructures. Therefore, the model no longer pursues one-time modeling within the full sequence, but adopts the idea of segmented sliding window modeling, first local prediction and then global integration, so as to reduce the computational complexity while retaining the pseudoknot information. Specifically, iPKnot++ first divides the input RNA sequence \(x=\left(x_1,x_2,...,x_n\right)\) into multiple overlapping local windows \(\left[x_i,x_{i+\omega}\right]\), the length of each window is \(\omega\), with fixed steps \(S\) between adjacent windows. The interior of each window is regarded as an independent modeling unit, and the model calculates the pairing probability \(P\left(i,j\right)\) for all possible base pairs \(\left(i,j\right)\) in it, which is derived from the comprehensive scoring of two models: ViennaRNA provides a thermodynamic model based on the minimum free energy, and CONTRAfold provides a statistical learning model based on conditional probability. It is assumed here that the pairing probability meets the independence simplification assumption so that the dynamic programming/graph decoding process can proceed within an acceptable time complexity. The core of pseudoknot modeling is the adoption of an approximate Maximum Expected Accuracy (MEA) optimization strategy. Specifically, the goal of the model is to find a set of base pairs \(S^*\) to maximize the overall expected accuracy, that is \[S^*=\arg\displaystyle\max_S\displaystyle\sum_{\left(i,j\right)\in S}P\left(i,j\right)\cdot\delta_{\left(i,j\right)}.\] Here, \(\delta_{\left(i,j\right)}\) indicates whether the base pairs \(\left(i,j\right)\) are included in the final structure. The target retains all the candidate optimal local structures after solving independently in each window. In the structural integration stage, the model maps the local predictions back to the original full-length sequence coordinates. For structural conflicts in predictions of different window (such as bases being paired to different objects by multiple windows), the model designs a conflict resolution mechanism based on reliability weighting, which retains structure combinations with higher probability or lower free energy. For the pseudoknot structures formed across windows, the model uses a bridging mechanism to merge the segmented predictions into a continuous structure to ensure the consistency and completeness of the overall prediction. Finally, the model output includes information such as RNA secondary structures in standard dot-bracket form of expression, consistent base pair lists, visualization maps, and pairing probability matrices, supporting subsequent tasks such as structural function analysis, target design, or stability assessment. The structure of the model greatly improves the structure prediction efficiency and scale adaptability of long stranded RNA without sacrificing the ability of pseudoknot modeling. Evaluation of the Functional Structures of RNA We used the iPKnot++ algorithm to predict the secondary structure of the 9 RNA sequences we constructed, and each sequence output two results, using the LinearPartition model with CONTRAfold parameter[28] (LinearPartition-C) and the LinearPartition model with ViennaRNA parameter[24] (LinearPartition-V) as the calculation model for base pairing probability. The output results are the maximum probability structure (MPS) based on machine learning and the minimum free energy (MFE) based on thermodynamic optimization. The prediction results for each sequence are shown in table 6.
Secondary structure prediction results
(Click the image to enlarge, use mouse wheel to zoom, and drag to move.)
Sequence MPS MFE
Seq A
Seq B
Seq C
Seq D
Seq E
Seq F
Seq G
Seq H
Seq I
Seq J
Seq K
Seq L
Relevant literature points out that external interactions mainly occur between the unpaired regions of two RNA structures[29], and the probability that the sequence is not paired can be used as an important factor for the accessibility of local RNA target[30], so we believe that the binding efficiency of the sensor region and the target RNA under the same conditions is positively correlated with the degree of exposure of the sensor region. To ensure the functionality of the editing site, we need to ensure that the editing site "UAG" is fully accessible. In order to select the optimal scheme, we quantitatively evaluated the structure prediction results from the following two dimensions:
Indicator of the exposure of the sensor area: unpaired rate (UPR) To evaluate the folding of the sensor region, we introduced a structural metric called "unpaired probability", which is defined as the ratio of the number of unpaired bases in the region to the total number of bases in the region. A higher UPR indicates that more bases in the region are unpaired, which is more conducive to dsRNA formation and ADAR editing. \[\mathrm{UPR}=\dfrac{\mathrm{sensor}_{\text{unpaired}}}{\mathrm{sensor}_{\text{total}}}\]
Indicator of the accessibility of the editing site: the length of the exposed areas on both sides of UAG (\(L_{\text{expose}}\)) To further evaluate the spatial accessibility of the target "UAG" editing site, we proposed the "editing site exposed length" metric, which is defined as the maximum sum of the left and right consecutive unpaired bases of the UAG. \[L_{\text{expose}}=\displaystyle\max\left(l_{\text{unpaired}}^{\text{left}}+l_{\text{unpaired}}^{\text{right}}\right)\] Among them, \(l_{\text{unpaired}}^{\text{left}}\) and \(l_{\text{unpaired}}^{\text{right}}\) respectively represent the lengths of consecutive unpaired bases to the left and right of the editing site, counted only from the editing site outward until the first paired base is encountered or the sequence terminates. We believe that the higher this index, the more open the space around the editing site and the higher the editing efficiency of the ADAR enzyme.
Based on the metrics above, we systematically evaluated the structures of Seq A–L, and the results are shown in the Table 7 (UAG exposed length of sensor1 and sensor2 is calculated separately):
Results of selected sequence functional structure assessment
Sequence Calculation model Unpaired Total base number UPR \(l_{\text{unpaired}}^{\text{left}}\) \(l_{\text{unpaired}}^{\text{right}}\) \(L_{\text{expose}}\)
Seq A LinearPartition-C 96 186 96/186 24, 0 10, 0 34, 0
LinearPartition-V 80 186 80/186 12, 0 0, 0 12, 0
Seq B LinearPartition-C 100 186 100/186 14, 0 0, 0 14, 0
LinearPartition-V 67 186 67/186 0, 3 5, 3 5, 6
Seq C LinearPartition-C 116 186 116/186 34, 2 20, 13 54, 15
LinearPartition-V 84 186 84/186 12, 1 3, 13 15, 14
Seq D LinearPartition-C 102 186 102/186 0, 14 28, 4 28, 18
LinearPartition-V 80 186 80/186 0, 0 2, 0 2, 0
Seq E LinearPartition-C 80 186 80/186 0, 0 2, 0 2, 0
LinearPartition-V 79 186 79/186 0, 3 4, 3 4, 6
Seq F LinearPartition-C 84 186 84/186 0, 7 2, 13 2, 20
LinearPartition-V 81 186 81/186 1, 1 5, 13 6, 14
Seq G LinearPartition-C 78 186 78/186 0, 6 10, 3 10, 9
LinearPartition-V 88 186 88/186 9, 12 2, 0 11, 12
Seq H LinearPartition-C 85 186 85/186 0, 8 0, 3 0, 11
LinearPartition-V 76 186 76/186 2, 3 6, 0 8, 3
Seq I LinearPartition-C 127 186 127/186 3, 0 3, 0 6, 0
LinearPartition-V 61 186 61/186 3, 3 3, 0 6, 3
Seq J LinearPartition-C 125 186 125/186 3, 10 3, 21 6, 31
LinearPartition-V 83 186 83/186 3, 4 3, 3 6, 7
Seq K LinearPartition-C 109 186 109/186 0, 43 6, 4 6, 47
LinearPartition-V 96 186 96/186 0, 0 19, 3 19, 3
Seq L LinearPartition-C 75 186 75/186 3, 0 3, 11 6, 11
LinearPartition-V 73 186 73/186 1, 0 13, 2 14, 2
From the evaluation results, it is evident that Seq C has a high unpaired probability in both models, indicating that the sensor region has a more loose overall structure, making it easier for target RNA to bind. Additionally, the exposed length around the editing site in Seq C is significantly higher than in other constructs (e.g., 54 and 15 for LinearPartition-C), suggesting that the editing site has good accessibility. In contrast, sequences such as Seq A and Seq J exhibit overly tight structures or limited editing sites, which are not conducive to functional performance. Considering the structural stability and functional exposure, Seq C is the optimal design scheme, which will be used for subsequent plasmid construction and functional validation. Conclusion By building the RADAR system and incorporating the ADAR editing mechanism, we successfully designed and evaluated a series of RNA constructs with LIF and PLOD2 targeting capabilities. The iPKnot++ structural prediction and dual exposure metrics provided us with a quantitative reference, allowing us to rationally select RNA sequences with the best matching degree of structural function. In the next step, we will construct complete expression plasmids based on Seq A and perform expression detection in breast adipocytes to evaluate its actual responsiveness, editing efficiency, and downstream signal triggering ability. RADAR System Dynamics Modeling Analysis of RADAR System The heart of our project is using the RADAR system to detect LIF and PLOD2, so a comprehensive and detailed understanding of RADAR system is critical to the success of the project. Through the simulation and dynamics analysis of the reaction process of the RADAR system, we can obtain key information such as the trends of substrate change and the time when the reporter gene expression peaks[31], which will provide important references for the design and optimization of our project. We simulated the reactions of single and dual input RADAR systems, and the simulation process includes the following steps[32]:
Constructing the dynamics equations of the RADAR system;
Converting dynamics equations to ordinary differential equations;
Collecting parameter values;
Solving equations using Python;
Analyzing the results.
Simulation and Analysis of the Intracellular RADAR System's Reactions The interaction of sensor RNA in the RADAR system with its target trigger RNA causes it to respond and initiate the translation of downstream reporter genes. Since the reaction process and parameters of single input sensor1 and trigger1 are the same as those of single input sensor2 and trigger2, we assume that the two reactions have the same concentration-time curve of reporter gene expression and peak time. Without loss of generality, the following is a dynamic simulation using trigger1 as an example. Simulation and Analysis of Single Input RADAR System In the RADAR system, the trigger and sensor are paired within a certain base sequence, and the stop codon in it is converted from UAG to UIG under the effect of ADAR enzyme, removing the restriction to downstream protein translation.
The reaction process of single input RADAR system
As is shown in figure 4, the reaction process of single input RADAR system mainly consists of the following steps:
PLM_s (sensor plasmid) is transcribed to generate PSRS (precatalytic single input RADAR system sensor).
PLM_t1 (trigger plasmid) is transcribed to generate trigger1 (trigger RNA).
PSRS binds to trigger1 to form PSRS_1 complex.
PLM_ADAR is transcribed to generate ADARmRNA, which is further translated into ADAR catalytic proteins.
Under the catalysis of ADAR, the stop codon of PSRS_1 is converted from UAG to UIG, and PSRS_1 is then converted into CSRS (catalyzed RNA complex).
CSRS is translated to generate the reporter protein Gluc.
The Reaction Equation of Single Input RADAR System The reaction process of single input RADAR system can be expressed by the following dynamics equations:
Transcription and basal degradation \[[\mathrm{PLM}_s]\xrightarrow{k_{\mathrm{tr}_s}}[\mathrm{PSRS}]\xrightarrow{k_{\mathrm{dePSRS}}}\varnothing\] \[[\mathrm{PLM}_{t1}]\xrightarrow{k_{\mathrm{tr}_{t1}}} [\mathrm{trigger1}]\xrightarrow{k_{\mathrm{de}_{t1}}} \varnothing\] \[[\mathrm{PLM}_{\mathrm{ADAR}}]\xrightarrow{k_{\mathrm{tr}_{\mathrm{ADAR}}}}[\mathrm{ADARmRNA}]\xrightarrow{k_{\mathrm{deAm}}}\varnothing\]
ADAR protein translation and degradation \[[\mathrm{ADARmRNA}]\xrightarrow{k_{\mathrm{tlAm}}}[\mathrm{ADAR}]\xrightarrow{k_{\mathrm{deADAR}}}\varnothing\]
Complex formation and catalysis \[[\mathrm{PSRS}]+[\mathrm{trigger1}]\xrightarrow{k_{\mathrm{cb}_{t1}}}[\mathrm{PSRS}_1]\xrightarrow{k_{\mathrm{dePSRS}_1}}\varnothing\] \[[\mathrm{PSRS}_1]+[\mathrm{ADAR}]\xrightarrow{k_{\mathrm{catPSRS}_1}}[\mathrm{CSRS}]\xrightarrow{k_{\mathrm{deCSRS}}}\varnothing\]
Reporter protein expression \[[\mathrm{CSRS}]\xrightarrow{k_{\mathrm{tlCSRS}}}[\mathrm{Gluc}]\xrightarrow{k_{\mathrm{seGluc}}}\varnothing^*\]
The Ordinary Differential Equations of Single Input RADAR System Based on the chemical reactions above, we establish the following ordinary differential equations to describe the dynamic behavior of the system:
Plasmid template (PLM) dynamics \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_s]&=-k_{\mathrm{de}_s}[\mathrm{PLM}_s]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{t1}]&=-k_{\mathrm{de}_{t1}}[\mathrm{PLM}_{t1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{\mathrm{ADAR}}]&=-k_{\mathrm{de}_{\mathrm{ADAR}}}[\mathrm{PLM}_{\mathrm{ADAR}}]\end{align}\]
RNA and protein production/degradation \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PSRS}]&=k_{\mathrm{tr}_{s}}[\mathrm{PLM}_s]-k_{\mathrm{dePSRS}}[\mathrm{PSRS}]-k_{\mathrm{cb}_{t1}}[\mathrm{PSRS}][\mathrm{trigger1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{trigger1}]&=k_{\mathrm{tr}_{t1}}[\mathrm{PLM}_{t1}]-k_{\mathrm{de}_{t1}}[\mathrm{trigger1}]-k_{\mathrm{cb}_{t1}}[\mathrm{PSRS}][\mathrm{trigger1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{ADARmRNA}]&=k_{\mathrm{tr}_{\mathrm{ADAR}}}[\mathrm{PLM}_{\mathrm{ADAR}}]-k_{\mathrm{deAm}}[\mathrm{ADARmRNA}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{ADAR}]&=k_{\mathrm{tlAm}}[\mathrm{ADARmRNA}]-k_{\mathrm{deADAR}}[\mathrm{ADAR}]\end{align}\]
Complex formation and catalysis \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PSRS}_1]&=k_{\mathrm{cb}_{t1}}[\mathrm{PSRS}][\mathrm{trigger1}]-k_{\mathrm{dePSRS}_1}[\mathrm{PSRS}_1]-k_{\mathrm{catPSRS}_1}[\mathrm{PSRS}_1][\mathrm{ADAR}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{CSRS}]&=k_{\mathrm{catPSRS}_1}[\mathrm{PSRS}_1][\mathrm{ADAR}]-k_{\mathrm{deCSRS}}[\mathrm{CSRS}]\end{align}\]
Reporter protein expression \[\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{Gluc}]=k_{\mathrm{tlCSRS}}[\mathrm{CSRS}]-k_{\mathrm{seGluc}}[\mathrm{Gluc}]\]
The parameters are described in the following table:
Description of single input RADAR system parameters
Abbreviation Description Abbreviation Description
PLMs sensor plasmid of PSRS Ktr_s sensor plasmid transcription rate constant
PSRS pre-catalytic single-input RADAR sensor KdePSRS PSRS degradation rate constant
PLMt1 trigger plasmid of trigger1 Ktr_t1 trigger plasmid transcription rate constant
Kde_t1 trigger1 degradation rate constant PLMADAR plasmid of ADAR
KtrADAR ADAR plasmid transcription rate constant ADARmRNA The mRNA of ADAR
kdeAm ADARmRNA degradation rate constant ktlAm ADARmRNA translation rate constant
ADAR adenosine deaminases acting on RNA kdeADAR ADAR degradation rate constant
Kcb_t1 combine rate constant of trigger1 and PSRS PSRS_1 PSRS & trigger1 complex
KdePSRS_1 PSRS_1 degradation rate constant KcatPSRS_1 PSRS_1 catalytic rate constant
CSRS post-catalytic single-input RADAR sensor KdeCSRS CSRS degradation rate constant
KtlCSRS CSRS translation rate constant Gluc Gaussia Luciferase
KseGluc Gluc secretion rate constant degradated substance
* Gluc secretion rate constant
Simulation of parameter values To simulate the single input RADAR system, we used the following parameter values:
Value and references of single input RADAR system parameters
Abbreviation Description Value (Unit) Reference
ktr_s sensor plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.[33]
kdePSRS PSRS degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
ktr_t1 trigger1 plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.
kde_t1 trigger1 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
ktrADAR ADAR plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.
ktlAm ADARmRNA translation rate constant 0.0022 Gayet RV et al., 2023.[34]
kdeAm ADARmRNA degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
kdeADAR ADAR degradation rate constant 6.418×10-7(s-1) Bax BE et al., 2001.[35]
kcb_t1 combine rate constant of trigger1 and PSRS 1×105(M-1·s-1) Baabu et al., 2022.
kdePSRS_1 PSRS1 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
kcatPSRS_1 PSRS1 catalytic rate constant 0.0012(s-1) Véliz EA et al., 2003.[36]
kdeCSRS CSRS degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
ktlCSRS CSRS translation rate constant 0.0336(s-1) Verhaegent M et al., 2002.[37]
kseGluc Gluc degradation rate constant 0.20(s-1) Song Y et al., 2012.[38]
Analysis of Simulation Results In the simulation, we set the initial value of PLM_s, PLM_t1, and PLM_ADAR as 400000, 500000, and 75000 molecules respectively, and the other components as 0. We used Python's solve_ivp function and LSODA method for numerical solution and simulated the dynamic behavior of the system within 72 hours.
The change of single input RADAR system components' concentration over time (logarithmic scale)
As can be seen from figure 5, the dynamic behavior of the system can be divided into several stages:
Early stage (0-5 hours) PSRS and trigger1 are rapidly generated and quickly combine to form PSRS_1 complex. The ADAR protein starts to accumulate.
Mid stage (5-40 hours) The PSRS_1 complex is transformed into CSRS under the catalysis of ADAR. Although the concentration of CSRS is relatively low, it is sufficient to initiate the translation of Gluc. The concentration of ADAR continues to rise, further accelerating the catalytic reaction.
Late stage (40-72 hours) Due to the gradual depletion of plasmids, the system tends to reach equilibrium, and the expression of Gluc begins to decline.
Figure 6 shows the changes in Gluc expression level over time and the visual threshold lines:
Gluc expression curves in single input RADAR system
As is shown in figure 6, the Gluc concentration reaches a maximum of \(1.93\times 10^5\) molecular numbers (equivalent to 320.17 nM) at 35.89 hours and reaches the visualization threshold (150 nM) at 14.05 hours. Gluc reaches half of its maximum value at 14.70 hours. This indicates that the single input RADAR system needs approximately 14 hours to produce visible results when detecting target RNA. The concentration of ADAR protein continues to rise, reaching \(4.99\times 10^7\) molecular numbers (82783.37 nM) at 72 hours. Due to the very low degradation rate of ADAR[39], it can accumulate and continuously catalyze the conversion of \(PSRS_1\) into CSRS in the system. It can be seen from the dynamic simulation result that although the maximum concentration of CSRS is only 0.24 nM, which is much lower than that of other components, it is able to generate sufficient Gluc signals due to its efficient translation ability. This signal amplification effect is an important feature of the RADAR system, making it able to sensitively detect target RNA with low concentrations. Overall, single input RADAR system shows the following characteristics:
The system has good response to trigger RNA and is capable of generating a detectable Gluc signal within 14.05 hours.
Signal amplification is obvious, and a small amount of CSRS can produce a significant level of Gluc.
Gluc expression exhibits a typical pattern of first growth and then decrease, which is consistent with the protein expression pattern in actual biological systems.
These simulation results provide important references for us to optimize the design of the RADAR system. By adjusting key parameters such as the expression level of ADAR and the translation efficiency of CSRS, we can further improve the sensitivity and response speed of the system. Simulation and Analysis of Dual Input RADAR System The dual input RADAR system requires two triggers to interact with the sensor simultaneously to convert the upstream stop codon of the reporter gene Gluc from UAG to UIG, and then initiate its translation. Schematic Diagram of the Reaction Process of Dual Input RADAR System
The reaction process of dual input RADAR system
As is shown in figure 7, the reaction process of dual input RADAR system mainly consists of the following steps:
PLM_as (AND sensor plasmid) is transcribed to generate PTRS (precatalytic dual-input RADAR system sensor).
PLM_t1 and PLM_t2 are transcribed to generate trigger1 and trigger2 respectively.
PTRS first binds to trigger1 to form PTRS_1 complex.
PTRS_1 further bind to trigger2 to form PTRS_1_2 complex.
PLM_ADAR is transcribed to generate ADARmRNA, which is further translated into ADAR catalytic proteins.
Under the catalysis of ADAR, two stop codons of PTRS_1_2 are converted from UAGs to UIGs, and PTRS_1_2 are then converted into CTRS (catalyzed RNA complex).
CTRS is translated to generate the reporter protein Gluc.
The Reaction Equation of Dual Input RADAR System The reaction process of the dual-input RADAR system can be expressed by the following chemical equations:
Transcription and basal degradation \[[\mathrm{PLM}_{as}]\xrightarrow{k_{\mathrm{tr}_{as}}}[\mathrm{PTRS}]\xrightarrow{k_{\mathrm{dePTRS}}}\varnothing\] \[[\mathrm{PLM}_{t1}]\xrightarrow{k_{\mathrm{tr}_{t1}}}[\mathrm{trigger1}]\xrightarrow{k_{\mathrm{de}_{t1}}}\varnothing\] \[[\mathrm{PLM}_{t2}]\xrightarrow{k_{\mathrm{tr}_{t2}}}[\mathrm{trigger2}]\xrightarrow{k_{\mathrm{de}_{t2}}}\varnothing\] \[[\mathrm{PLM}_{\mathrm{ADAR}}]\xrightarrow{k_{\mathrm{tr}_{\mathrm{ADAR}}}}[\mathrm{ADARmRNA}]\xrightarrow{k_{\mathrm{deAm}}}\varnothing\]
Translation and degradation \[[\mathrm{ADARmRNA}]\xrightarrow{k_{\mathrm{tlAm}}}[\mathrm{ADAR}]\xrightarrow{k_{\mathrm{deADAR}}}\varnothing\]
Complex formation \[[\mathrm{PTRS}]+[\mathrm{trigger1}]\xrightarrow{k_{\mathrm{cb1}}}[\mathrm{PTRS}_{1}]\xrightarrow{k_{\mathrm{dePTRS}_{1}}}\varnothing\] \[[\mathrm{PTRS}_{1}]+[\mathrm{trigger2}]\xrightarrow{k_{\mathrm{cb2}}}[\mathrm{PTRS}_{1\_2}]\xrightarrow{k_{\mathrm{dePTRS}_{1\_2}}}\varnothing\]
Catalytic and reporter proteins \[[\mathrm{PTRS}_{1\_2}]+[\mathrm{ADAR}]\xrightarrow{k_{\mathrm{catPTRS}_{1\_2}}}[\mathrm{CTRS}]\xrightarrow{k_{\mathrm{deCTRS}}}\varnothing\] \[[\mathrm{CTRS}]\xrightarrow{k_{\mathrm{tlCTRS}}}[\mathrm{Gluc}]\xrightarrow{k_{\mathrm{seGluc}}}\varnothing^*\]
Based on the chemical reactions above, we established the following ordinary differential equations to describe the dynamic behavior of the system:
Plasmid template dynamics \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{as}]&=-k_{\mathrm{de}_{as}}[\mathrm{PLM}_{as}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{t1}]&=-k_{\mathrm{de}_{t1}}[\mathrm{PLM}_{t1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{t2}]&=-k_{\mathrm{de}_{t2}}[\mathrm{PLM}_{t2}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PLM}_{\mathrm{ADAR}}]&=-k_{\mathrm{de}_{\mathrm{ADAR}}}[\mathrm{PLM}_{\mathrm{ADAR}}]\end{align}\]
RNA and protein production/degradation \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PTRS}]&=k_{\mathrm{tr}_{as}}[\mathrm{PLM}_{as}]-k_{\mathrm{dePTRS}}[\mathrm{PTRS}]-k_{\mathrm{cb1}}[\mathrm{PTRS}][\mathrm{trigger1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{trigger1}]&=k_{\mathrm{tr}_{t1}}[\mathrm{PLM}_{t1}]-k_{\mathrm{de}_{t1}}[\mathrm{trigger1}]-k_{\mathrm{cb1}}[\mathrm{PTRS}][\mathrm{trigger1}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{trigger2}]&=k_{\mathrm{tr}_{t2}}[\mathrm{PLM}_{t2}]-k_{\mathrm{de}_{t2}}[\mathrm{trigger2}]-k_{\mathrm{cb2}}[\mathrm{PTRS}_{1}][\mathrm{trigger2}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{ADARmRNA}]&=k_{\mathrm{trADAR}}[\mathrm{PLM}_{\mathrm{ADAR}}]-k_{\mathrm{deAm}}[\mathrm{ADARmRNA}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{ADAR}]&=k_{\mathrm{tlAm}}[\mathrm{ADARmRNA}]-k_{\mathrm{deADAR}}[\mathrm{ADAR}]\end{align}\]
Complex formation and catalysis \[\begin{align}\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PTRS}_{1}]&=k_{\mathrm{cb1}}[\mathrm{PTRS}][\mathrm{trigger1}]-k_{\mathrm{dePTRS}_{1}}[\mathrm{PTRS}_{1}]-k_{\mathrm{cb2}}[\mathrm{PTRS}_{1}][\mathrm{trigger2}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{PTRS}_{1\_2}]&=k_{\mathrm{cb2}}[\mathrm{PTRS}_{1}][\mathrm{trigger2}]-k_{\mathrm{dePTRS}_{1\_2}}[\mathrm{PTRS}_{1\_2}]-k_{\mathrm{catPTRS}_{1\_2}}[\mathrm{PTRS}_{1\_2}][\mathrm{ADAR}]\\\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{CTRS}]&=k_{\mathrm{catPTRS}_{1\_2}}[\mathrm{PTRS}_{1\_2}][\mathrm{ADAR}]-k_{\mathrm{deCTRS}}[\mathrm{CTRS}]\end{align}\]
Reporter protein expression \[\frac{\mathrm{d}}{\mathrm{d}t}[\mathrm{Gluc}]=k_{\mathrm{tlCTRS}}[\mathrm{CTRS}]-k_{\mathrm{seGluc}}[\mathrm{Gluc}]\]
The parameters are described in the following table:
Description of dual input RADAR system parameters
Abbreviation Description Abbreviation Description
PLMas AND sensor plasmid of PSRS Ktr_as AND sensor plasmid transcription rate constant
PTRS pre-catalytic two-input RADAR sensor KdePTRS PTRS degradation rate constant
PLMt1 plasmid of trigger1 Ktr_t1 trigger1 plasmid transcription rate constant
Kde_t1 trigger1 degradation rate constant PLMt2 plasmid of trigger2
Ktr_t2 trigger2 plasmid transcription rate constant Kde_t2 trigger2 degradation rate constant
PLMADAR plasmid of ADAR KtrADAR ADAR plasmid transcription rate constant
ADARmRNA The mRNA of ADAR KdeAm ADARmRNA degradation rate constant
KtlAm ADARmRNA translation rate constant ADAR adenosine deaminases acting on RNA
KdeADAR ADAR degradation rate constant Kcb1 combine rate constant of trigger1 and PTRS
PTRS_1 PTRS & trigger1 complex KdePTRS_1 PTRS1 degradation rate constant
Kcb2 combine rate constant of trigger2 and PTRS1 PTRS_1_2 PTRS1 & trigger2 complex
KdePTRS_1_2 PTRS_1_2 degradation rate constant KcatPTRS_1_2 PTRS_1_2 catalytic rate constant
CTRS post-catalytic two-input RADAR sensor KdeCTRS CTRS degradation rate constant
KtlCTRS CTRS translation rate constant Gluc Gaussia Luciferase
KseGluc Gluc secretion rate constant degradated substance
* secreted substance
Simulation of parameter values To simulate the dual input RADAR system, we used the following parameter values:
Value and references of dual input RADAR system parameters
Abbreviation Description Value (Unit) Reference
Ktr_as sensor plasmid transcription rate constant 1.1×10-3 (s-1) Baabu et al., 2022.
KdePTRS PTRS degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
Ktr_t1 trigger1 input plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.
Kde_t1 trigger1 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
Ktr_t2 trigger2 input plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.
Kde_t2 trigger2 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
KtrADAR ADAR plasmid transcription rate constant 1.1×10-3(s-1) Baabu et al., 2022.
KtlAm ADARmRNA translation rate constant 0.0022 Gayet RV et al., 2023.
KdeAm ADARmRNA degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
KdeADAR ADAR degradation rate constant 6.418×10-7(s-1) Bax BE et al., 2001.
Kcb_1 combine rate constant of trigger1 and PTRS 1×105(M-1·s-1) Baabu et al., 2022.
KdePTRS_1 PTRS1 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
Kcb_2 combine rate constant of trigger2 and PTRS_1 1×105(M-1·s-1) Baabu et al., 2022.
KdePTRS_1_2 PTRS_1_2 degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
KcatPTRS_1_2 PTRS_1_2 catalytic rate constant 0.0012(s-1) Véliz EA et al., 2003.
KdeCTRS CTRS degradation rate constant 3×10-4(s-1) Baabu et al., 2022.
KtlCTRS CTRS translation rate constant 0.0432(s-1) Verhaegent M et al., 2002.
KseGluc Gluc secretion rate constant 0.20(s-1) Song Y et al., 2012.
Analysis of Simulation Results In the simulation, we set the initial value of PLM_s, PLM_t1, PLM_t2, and PLM_ADAR as 400000, 500000, 500000, and 75000 molecules respectively, and the other components as 0. We used Python's solve_ivp function and LSODA method for numerical solution and simulated the dynamic behavior of the system within 72 hours. Figure 8 shows the change in the concentration of each component in the system over time.
The change of dual input RADAR system components' concentration over time (logarithmic scale)
As can be seen from figure 8, the dynamic behavior of the dual-input system exhibits multi-stage characteristics:
Early stage: PSRS, trigger1, and trigger2 are rapidly generated. PTRS combine with trigger1 to form PTRS_1, and subsequently PTRS_1 combine with trigger2 to form PTRS_1_2 complex.
Mid stage: The PSRS_1_2 complex is transformed into CTRS under the catalysis of ADAR. It is worth noting that the CTRS concentration, while relatively low (up to about 100 molecule counts), is sufficient to initiate Gluc translation.
Late stage: Over time, the number of plasmids in the system gradually decreases, and the concentration of each component tends to stabilize or decrease. ADAR protein continues to accumulate with the concentration close to 10^8 molecular numbers, which is consistent with the low degradation rate of ADAR.
Figure 9 shows the changes in Gluc expression level over time in dual input system:
Gluc expression curves in dual input RADAR system
As is shown in figure 9, the Gluc concentration reaches a maximum of \(1.91\times 10^5\) molecular numbers (equivalent to 316.76 nM) at 35.75 hours and reaches the visualization threshold (150 nM) at 14.31 hours. Gluc reaches half of its maximum value at 14.63 hours. This indicates that the dual input RADAR system needs approximately 14 hours to produce visible results when detecting target RNA. The concentration of ADAR protein continues to rise, reaching \(10^8\) molecular numbers. Due to the very low degradation rate of ADAR, it can accumulate and continuously catalyze the conversion of PSRS_1_2 into CTRS in the system. It can be seen from the dynamic simulation result that although the maximum concentration of CTRS is only 100 molecular numbers (about 0.17 nM), which is much lower than that of other components, it is able to generate sufficient Gluc signals due to its efficient translation ability. This signal amplification effect is an important feature of the RADAR system, making it able to sensitively detect target RNA with low concentrations. Overall, dual input RADAR system shows the following characteristics:
The system has good synergistic response to the two trigger RNAs and can generate detectable Gluc signal within 14.13 hours.
Signal amplification is obvious, and a small amount of CTRS can produce a significant level of Gluc (up to 316.76 nM).
Gluc expression exhibits a typical pattern of first growth and then decrease, which is consistent with the single input system.
Although the reaction path is longer and requires two triggers to be combined sequentially, the response efficiency of the system is not significantly reduced.
These simulation results provide important references for us to optimize the design of the dual input RADAR system. By adjusting key parameters such as the expression level of ADAR, the binding order and rate of the two triggers, and the translation efficiency of CTRS, we can further improve the sensitivity and response speed of the system. Conclusion Through dynamic modeling and simulation of the dual input RADAR system, we successfully revealed its potential function as a highly specific biological "AND" logic gate. This system is a powerful and reliable molecular diagnostic tool, especially suitable for scenarios where multiple RNA biomarkers need to be detected simultaneously to improve the accuracy of diagnosis, laying a solid theoretical foundation for the development of more accurate early screening and diagnosis methods for cancer. Glomerular Filtration Barrier Background and Objectives The Glomerular Filtration Barrier (GFB) is a highly selective multilayer biological filtration system that allows water, small molecules and some small and medium-sized protein molecules to enter the primary urine across the membrane while effectively preventing the leakage of large proteins (such as albumin). An in-depth understanding of the physical-chemical filtration mechanisms of GFB is critical for predicting the emergence time, detectability, and signal amplitude of reporter proteins (e.g., Gaussia luciferase, Gluc, 19.9 kDa, in this project[40]) in urine, so that the risk of breast cancer occurrence or relapse could be conveniently monitored in patients using our RADAR system. In this study, a macrokinetics coupling model combined with multi-scale agent-based was constructed to quantitatively analyze the microkinetic behavior of Gluc molecules passing through the four functional layers of GFB, and extrapolated to the organ scale to answer two core questions:
What percentage (Sieving Coefficient, SC) of Gluc protein in the blood can filter through GFB?
What is the time window for filtered Gluc to reach a preset detection threshold in urine (Detection Latency)?
Simulation of the Membrane Structure In tradition, GFB is often conceptualized as three layers: the endothelial layer with fenestrations, the glomerular basement membrane (GBM), and the podocyte slit diaphragm. To capture fine-grained differences in size selecting, surface charge, and geometric labyrinth effects, we refine its functionality into four layers[41]:
Details on four layers of membrane structure
Layer Position(nm) Diameter size (nm) Surface charge (mV) Function
Layer 1: Endothelial Cell Layer 0-100 60 0 Initial barrier, with large-pore windows, allowing most small molecules and medium-sized molecules to pass while blocking blood cells[42]
Layer 2: Basement Membrane 100-463.6 3.6 -30 Main filtration barrier, combining size and charge selectivity[43]
Layer 3: Podocyte Layer 463.6-613.6 40 0 Structure framework for forming the filtration canal
Layer 4: Split Membrane 613.6-726.1 12 -30 The final selective barrier, a filter mesh formed by specialized proteins[44]
Structure of glomerular filtration barrier
It shows four functional layers and a Gluc protein molecule (blue) across the membrane. The basement membrane represents the main filtration barrier because its pore size (3.6 nm) is similar to that of Gluc (3.5 nm). Each layer makes nonlinear contribute to the overall filtration flux through a specific pore size distribution, surface charge, and effective diffusion resistance. Figure 1 shows a four-layer structure and a representative migration path of Gluc molecule; GBM is the main limiting step because its pore size is similar to the hydrodynamic diameter of Gluc. Basic Assumptions In order to maintain physical credibility and computational operability, we made the following assumptions:
Molecular Characterization Gluc is seen as a rigid, negatively charged spherical particle with a hydrodynamic radius of 1.75 nm and a net charge of −3.2 e (pH 7.4 condition).
Membrane Steady-state The four-layer structure maintains its morphology and charge distribution unchanged over the simulated timescale.
Pore Size Statistics GBM pore sizes adopts log-normal distribution, other layers use uniform or restricted uniform distribution; The pore sizes of the slit diaphragm is slightly larger than that of the GBM, allowing for partial compensation.
Molecular Flexibility Introducing the probability of "flexible passage", so that particles can be squeezed through with a certain probability, statistically improving effective penetration[45].
Intermolecular Interactions Ignoring the direct interactions between Gluc–Gluc and focusing on the interactions between molecule-membrane and the superposition of random/directed migration.
Nephron Extrapolation The results of a single nephron are linearly extrapolated on nephron scale to predict the overall filtration flux of the kidney.
Model Construction Dynamics of Molecular Motion The movement of Gluc molecules through the filtration barrier is described by the Langevin equation, which combines Brownian motion, drift forces, and electrostatic interactions[46]: \[\dfrac{\mathrm{d}\mathbf{r}\left(t\right)}{\mathrm{d}t}=\mathbf{v}_{\text{drift}}+\dfrac{1}{\gamma}\mathbf{F}_{\text{electric}}+\sqrt{\dfrac{2k_BT}{\gamma}}\mathbf{\xi}\left(t\right).\] Among them, \(\mathbf{r}\left(t\right)\) is the position vector of the molecule, \(\mathbf{v}_{\text{drift}}\) is the vector of drift velocity caused by blood pressure, \(\mathbf{F}_{\text{electric}}\) is the electrostatic force, \(\gamma=6\pi\eta r\), is the drag coefficient (\(\eta\)= viscosity, and \(r\) = molecular radius), \(k_B\) is the Boltzmann constant, \(T\) is the temperature, and \(\xi\left(t\right)\) is the white noise that indicates Brownian motion. For numerical implementation, we use the Euler-Maruyama method to discretize this equation: \[\mathbf{r}\left(t+\Delta t\right)=\mathbf{r}\left(t\right)+\mathbf{r}_{\text{drift}}\Delta t+\dfrac{\mathbf{F}_{\text{electric}}}{\gamma}\Delta t+\sqrt{2D\Delta t}\mathbf{N}\left(0,1\right).\] Among them, \(D=\dfrac{k_RT}{6\pi\eta r}\) is the diffusivity, \(\mathbf{N}\left(0,1\right)\) are the random vectors that obey the normal distribution, and \(\Delta t\) is the time step. Electrostatic Interaction The electrostatic force exerted on charged molecules within the charged layer of the membrane is calculated as follows: \[\mathbf{F}_{\text{electric}}=q\mathbf{E}.\] Among them, \(q\) is the amount of charge of the Gluc molecule (-3.2e), and \(\mathbf{E}\) is the electric field inside the layer. In the charged layer (GBM and slit diaphragm), the electric field is estimated using membrane potential and thickness: \[\mathbf{E}=-\dfrac{V_{\text{membrane}}}{d_{\text{membrane}}}\hat{z}.\] Among them, \(V_{\text{membrane}}\) is the membrane potential (-30mV in the charged layer), \(d_{\text{membrane}}\) is the thickness of the layer, and \(\hat{z}\) is the unit vector in the filter direction. The Criterion of Molecules Passing Through the Membrane Pores The process of molecules passing through the membrane pores is determined by the combination of rigid size exclusion and molecular flexibility. \[P_{\text{passage}}=\begin{cases}1,&d_{\text{mol}}<d_{\text{pore}},\\P_{\text{flex}},&\alpha\cdot d_{\text{mol}}<d_{\text{pore}}<d_{\text{mol}},\\0,&d_{\text{pore}}<\alpha\cdot d_{\text{mol}}.\end{cases}\] Among them, \(d_{\text{mol}}\) is the molecular diameter, \(d_{\text{pore}}\) is the pore size, \(\alpha\) is flexibility threshold (0.8, which means that it is possible for molecules to pass through pores that are 20% smaller than their diameter), and \(P_{\text{flex}}\) is (0.15). Physiological-scale Integration To connect model microsimulations with physiological significance predictions, we used the following relationships: Calculation of sieving coefficient (SC): \[SC=\dfrac{N_{\text{filtered}}}{N_{\text{total}}}.\] Dynamics of Blood Concentration: \[\dfrac{\mathrm{d}C_{\text{blood}}\left(t\right)}{\mathrm{d}t}=-k_{\text{rate}}\cdot SC\cdot C_{\text{blood}}\left(t\right).\] Calculation of Urine Concentration: \[C_{\text{urine}}\left(t\right)=\dfrac{N_{\text{filtered}}\left(t\right)}{V_{\text{urine}}\left(t\right)}=\dfrac{SC\cdot V_{\text{blood}}\cdot C_{\text{blood}}\left(0\right)\cdot\left(1-e^{-k_{\text{rate}}t}\right)}{Q_{\text{urine}}\cdot\left(t-t_{\text{delay}}\right)}.\] Among them, \(N_{\text{filtered}}\) is the number of filtered molecules, \(N_{\text{total}}\) is the total number of molecules, \(k_{\text{rate}}\) is the filtration coefficient, \(V_{\text{blood}}\) is the volume of blood, \(V_{\text{urine}}\left(t\right)\) is the volume of accumulated urine, \(Q_{\text{urine}}\) is the urine flow rate, and \(t_{\text{delay}}\) is the delay time before the molecules appear in the urine. Solving of the Model Parameters of the Model
Physical and molecular parameters
Parameters Description Numeric value Unit
GLUC_RADIUS Gluc protein radius 1.75 nm
GLUC_CHARGE Gluc protein charge -3.2 e
TEMPERATURE Body temperature 310.15 K
VISCOSITY Plasma viscosity 3.5×10-3 Pa·s
DRIFT_VELOCITY Filtration drift velocity [0, 0, 8000] nm/s
FLEXIBILITY_THRESHOLD Molecular flexibility threshold 0.8 -
FLEXIBILITY_PROBABILITY Flexible molecules through probability 0.15 -
Membrane structure parameters
Layer Position (nm) Thickness (nm) Diameter (nm) Charge (mV) Pore rate
Endothelial cell layer[47-48] 0-100 100 60[47] 0 0.7[48]
Basement membrane[49-50] 100-463.6 363.6 4.3 -30 0.07
Podocyte layer 463.6-613.6 150 40 0 0.6
Schisis membrane 613.6-726.1 12.5 12 -30 0.3
Physiological parameters
Parameters Description Numeric value Unit
URINE_FLOW_RATE Urine production rate 1 mL/min
BLOOD_VOLUME Total blood volume 5000 mL
NEPHRON_COUNT Number of nephrons 106 -
BLOOD_GLUC_CONC Gluc concentration in blood 3.3×1018 molecules/L
MIN_VISIBLE_CONC Lowest detectable Gluc concentration 2.8×1010 molecules/L
TUBULAR_TRANSIT_TIME Tubular transport time 30 min
CIRCULATION_TIME Circulatory blood time 1 min
PHYSIO_TIME_SCALING Physiological time scaling factor 1800 -
Implementation Method Algorithm Implementation
Initialization Gluc particles (N=105) are arranged proximal to the endothelial layer, and the four layers of membrane are instantiated according to the characteristic parameters.
Code Block
SPARKS
1
// The structure is refined into four functional layers, which helps in separating the pore size and charge contribution.
2
Procedure DEFINE_MEMBRANE_PARAMS():
3
L1 ← Endothelium(z=100, thick=100, porosity=0.70, UNIFORM(r=30nm), charge=0)
4
L2 ← GBM(z=200, thick=363.6, porosity=0.07, LOGNORMAL(mean_r=2.15nm, σ=0.605nm), charge=-30)
5
L3 ← PodocyteBody(z=563.6, thick=150, porosity=0.60, GRID_SLITS(r=20nm, width=25nm), charge=0)
6
L4 ← SlitDiaphragm(z=713.6, thick=12.5, porosity=0.30, FIXED_DENSITY(r=6nm, dens=30/μm²), charge=-30)
7
return [L1,L2,L3,L4]
Generation of Pore Structure Endothelial/podocyte Body Layer: uniform random pore positions + fixed large pore sizes. GBM: Log-normally distributed pore sizes \(\ln r\sim N\) (\(\mu=\ln 3.6nm, \sigma=0.28\)). Slit Diaphragm: restricted uniform pore sizes (slightly larger than or similar to GBM for secondary selection).
Time Advancement \(\Delta t=1\mu s\); the overdamped approximation under the condition of \(\Delta t\ll\gamma−1\).
Convergence Criterion Reaching the maximum number of steps or SC(t) increment<0.01% lasting for 2×104 steps.
Extrapolation Construct organ-level elimination kinetics using fitted k and SC of single nephron.
Lag and Integration Urine test window is evaluated after superimposing renal tubule/renal collection system transport delays.
Code Block
SPARKS
1
// Determine whether the particle is blocked by the current membrane layer; the flexible mechanism allows the critical pore size to pass probabilistically.
2
Procedure BATCH_COLLISION_CHECK(pos, layer, r_agent, flex_on, flex_thr, flex_p):
3
in_layer = (z ∈ [z_min,z_max))
4
mask ← False; mask[in_layer] = True // Default blockage
5
if count(in_layer)=0: return mask
6
for each particle i with in_layer:
7
for pore j:
8
d2 = (x_i-x_j)2 + (y_i-y_j)2
9
if r_j > r_agent AND d2 < (r_j - r_agent)2:
10
mask[i]=False; break // Rigidity through
11
else if flex_on AND r_j ∈ [flex_thr·r_agent, r_agent) AND d2 < r_j2:
12
if RandBernoulli(flex_p)=1: mask[i]=False; break
13
return mask
Simulated process
Results and Analysis Filtration Efficiency: We concluded that the steady-state sieving coefficient (SC) is 10.3% from our simulation of 100 000 Gluc protein molecules. This means that about 10.3% of the Gluc molecules are able to pass through all four layers of the glomerular filtration barrier.
Temporal dynamics of Gluc concentration in urine
The dynamic curve exhibits three stages of “Lag phase - Rapid accumulation - Steady state”. The model shows that the concentration of Gluc in urine first reaches its lowest detectable concentration (2.8×1010 molecules/L) after 84 minutes, and then rapidly increases, reaching and exceeding the detection line after 1.40 hours, approaching 3×1018 molecules/L after 8 hours. Through the simulation, we found that the basement membrane is the main filtration barrier (with a passage rate of about 11%), while the endothelial layer and podocyte body layer exhibit high permeability (>80%), and the slit diaphragm is a secondary filter. The prediction result closely matches experimental data for proteins of similar molecular weight reported in literature[51], and the urine-blood concentration ratio is close to the theoretical value. Conclusion Under physiological conditions, 10.3% of blood Gluc can pass through the glomerular filtration barrier. Urine samples should be collected at least 84 minutes after the expected expression of Gluc in the blood, meaning that the detection latency required for filtered Gluc to reach the preset detection threshold in urine is 84 minutes. Our model demonstrates that Gluc protein is an effective reporter molecule for urine-based diagnostics, and its molecular characteristics allow sufficient filtration to generate a detectable signal within a clinically relevant timeframe. Limitations of the Model We need to acknowledge that there are several limitations of our model:
This model simplifies the actual glomerular structure (for example, by assuming a fixed pore distribution), but it can effectively explain the main phenomena.
We used simplified assumptions for charge and molecular shape. We modeled Gluc as a sphere and consider membrane charge as a uniform field, while there may be more complex regulatory mechanisms in the actual biological environment.
While executing the crucial initialization step, we encountered what seemed to be an insurmountable obstacle: the complete absence of precise physical parameters for the pore size distribution of the glomerular basement membrane (GBM) in existing literature. This made it impossible to define the parameters for L2 <- GBM(...) in our pseudocode, bringing our modeling work to a standstill. To overcome this challenge, we sought the expert guidance of Professor Haoran Yu from the School of Mathematics at Jilin University. Professor Yu offered a transformative perspective, suggesting that many random structures in biological systems, such as pore distributions, often adhere to specific probability patterns. He advised that we could leverage these common statistical patterns, combined with known physical properties of related glomerular filtration structures, to make reasonable estimations and input the parameters. This interdisciplinary insight illuminated our path forward, allowing us to break through the bottleneck. By constructing a parameter system based on biologically plausible inferences, we successfully implemented the agent-based model (ABM) simulation, paving the way for the subsequent steps of our implementation. Reference
Reference
1 Wei X, Li S, He J, et al. Tumor-secreted PAI-1 promotes breast cancer metastasis via the induction of adipocyte-derived collagen remodeling[J]. Cell communication and signaling, 2019, 17(1): 58.
2 Zhou C, He X, Tong C, et al. Cancer-associated adipocytes promote the invasion and metastasis in breast cancer through LIF/CXCLs positive feedback loop[J]. International Journal of Biological Sciences, 2022, 18(4): 1363.
3 Kim S, Oh J, Park C, et al. FAM3C in cancer-associated adipocytes promotes breast cancer cell survival and metastasis[J]. Cancer Research, 2024, 84(4): 545-559.
4 Hofwimmer K, de Paula Souza J, Subramanian N, et al. IL-1\(\beta\) promotes adipogenesis by directly targeting adipocyte precursors[J]. Nature Communications, 2024, 15(1): 7957.
5 Flower L, Gray R, Pinkney J, et al. Stimulation of interleukin-6 release by interleukin-1\(\beta\) from isolated human adipocytes[J]. Cytokine, 2003, 21(1): 32-37.
6 Kim H S, Jung M, Choi S K, et al. IL-6-mediated cross-talk between human preadipocytes and ductal carcinoma in situ in breast cancer progression[J]. Journal of Experimental & Clinical Cancer Research, 2018, 37(1): 200.
7 Talukder A, Barham C, Li X, et al. Interpretation of deep learning in genomics and epigenomics[J]. Briefings in Bioinformatics, 2021, 22(3): bbaa177.
8 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
9 Choi S R, Lee M. Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review[J]. Biology, 2023, 12(7): 1033.
10 Qin Y, Li C, Shi X, et al. MLP-based regression prediction model for compound bioactivity[J]. Frontiers in bioengineering and biotechnology, 2022, 10: 946329.
11 Haribabu S, Gupta G S, Kumar P N, et al. Prediction of flood by rainf all using MLP classifier of neural network model[C]//2021 6th international conference on communication and electronics systems (ICCES). IEEE, 2021: 1360-1365.
12 Odeh A, Alarbi A, Keshta I, et al. Efficient prediction of phishing websites using multilayer perceptron (MLP)[J]. Journal of Theoretical and Applied Information Technology, 2020, 98(16): 3353-3363.
13 Abd-elaziem A H, Soliman T H M. A multi-layer perceptron (mlp) neural networks for stellar classification: A review of methods and results[J]. International Journal of Advances in Applied Computational Intelligence, 2023, 3(10.54216).
14 Durairaj M, Revathi V. Prediction of heart disease using back propagation MLP algorithm[J]. Int. J. Sci. Technol. Res, 2015, 4(8): 235-239.
15 Safar A A, Salih D M, Murshid A M. Pattern recognition using the multi-layer perceptron (MLP) for medical disease: A survey[J]. International Journal of Nonlinear Analysis and Applications, 2023, 14(1): 1989-1998.
16 Raut R D, Dudul S V. Arrhythmias classification with MLP neural network and statistical analysis[C]//2008 First international conference on emerging trends in engineering and technology. IEEE, 2008: 553-558.
17 Al Bataineh A, Manacek S. MLP-PSO hybrid algorithm for heart disease prediction[J]. Journal of Personalized Medicine, 2022, 12(8): 1208.
18 Schisterman E F, Faraggi D, Reiser B, et al. Youden Index and the optimal threshold for markers with mass at zero[J]. Statistics in medicine, 2008, 27(2): 297-315.
19 Otto M, Wiltfang J, Schütz E, et al. Diagnosis of Creutzfeldt-Jakob disease by measurement of S100 protein in serum: prospective case-control study[J]. Bmj, 1998, 316(7131): 577-582.
20 Lamy P J, Grenier J, Kramar A, et al. Pro-gastrin-releasing peptide, neuron specific enolase and chromogranin A as serum markers of small cell lung cancer[J]. Lung Cancer, 2000, 29(3): 197-203.
21 Grmec Š, Gašparovic V. Comparison of APACHE II, MEES and Glasgow Coma Scale in patients with nontraumatic coma for prediction of mortality[J]. Critical Care, 2000, 5(1): 19.
22 Wehler M, Kokoska J, Reulbach U, et al. Short-term prognosis in critically ill patients with cirrhosis assessed by prognostic scoring systems[J]. Hepatology, 2001, 34(2): 255-261.
23 Kaseniit K E, Katz N, Kolber N S, et al. Modular, programmable RNA sensing using ADAR editing in living cells[J]. Nature Biotechnology, 2023, 41(4): 482-487.
24 Lorenz R, Bernhart S H, Höner zu Siederdissen C, et al. ViennaRNA Package 2.0[J]. Algorithms for molecular biology, 2011, 6(1): 26.
25 Zuker M. Mfold web server for nucleic acid folding and hybridization prediction[J]. Nucleic acids research, 2003, 31(13): 3406-3415.
26 Yoffe A M, Prinsen P, Gopal A, et al. Predicting the sizes of large RNA molecules[J]. Proceedings of the National Academy of Sciences, 2008, 105(42): 16153-16158.
27 Sato K, Kato Y. Prediction of RNA secondary structure including pseudoknots for long sequences[J]. Briefings in Bioinformatics, 2022, 23(1): bbab395.
28 Do C B, Woods D A, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models[J]. Bioinformatics, 2006, 22(14): e90-e98.
29 Salari R, Backofen R, Sahinalp S C. Fast prediction of RNA-RNA interaction[J]. Algorithms for molecular Biology, 2010, 5(1): 5.
30 Mückstein U, Tafer H, Hackermüller J, et al. Thermodynamics of RNA–RNA binding[J]. Bioinformatics, 2006, 22(10): 1177-1182.
31 Johnson K A. A century of enzyme kinetic analysis, 1913 to 2013[J]. FEBS letters, 2013, 587(17): 2753-2766.
32 Alon U. An introduction to systems biology: design principles of biological circuits[M]. Chapman and Hall/CRC, 2019.
33 Baabu P R S, Srinivasan S, Nagarajan S, et al. End-to-end computational approach to the design of RNA biosensors for detecting miRNA biomarkers of cervical cancer[J]. Synthetic and Systems Biotechnology, 2022, 7(2): 802-814.
34 Gayet R V, Ilia K, Razavi S, et al. Autocatalytic base editing for RNA-responsive translational control[J]. Nature Communications, 2023, 14(1): 1339.
35 Bain M D, Bax B E, Webster A D B, et al. Pilot studies of carrier erythrocyte-entrapped enzyme therapy in Gaucher’s disease and adenosine deaminase deficiency[J]. Clin. Sci, 1997, 92: 3P.
36 Véliz E A, Easterwood L H M, Beal P A. Substrate analogues for an RNA-editing adenosine deaminase: mechanistic investigation and inhibitor design[J]. Journal of the American Chemical Society, 2003, 125(36): 10867-10876.
37 Verhaegen M, Christopoulos T K. Recombinant Gaussia luciferase. Overexpression, purification, and analytical application of a bioluminescent reporter for DNA hybridization[J]. Analytical chemistry, 2002, 74(17): 4378-4385.
38 Song Y, Paul A V, Wimmer E. Evolution of poliovirus defective interfering particles expressing Gaussia luciferase[J]. Journal of virology, 2012, 86(4): 1999-2010.
39 Nishikura K. Functions and regulation of RNA editing by ADAR deaminases[J]. Annual review of biochemistry, 2010, 79(1): 321-349.
40 Tannous B A, Kim D E, Fernandez J L, et al. Codon-optimized Gaussia luciferase cDNA for mammalian gene expression in culture and in vivo[J]. Molecular Therapy, 2005, 11(3): 435-443.
41 Haraldsson B, Nyström J, Deen W M. Properties of the glomerular barrier and mechanisms of proteinuria[J]. Physiological reviews, 2008.
42 Satchell S C, Braet F. Glomerular endothelial cell fenestrations: an integral component of the glomerular filtration barrier[J]. American Journal of Physiology-Renal Physiology, 2009, 296(5): F947-F956.
43 Miner J H. The glomerular basement membrane[J]. Experimental cell research, 2012, 318(9): 973-978.
44 Pavenstadt H, Kriz W, Kretzler M. Cell biology of the glomerular podocyte[J]. Physiological reviews, 2003, 83(1): 253-307.
45 Deen W M, Lazzara M J, Myers B D. Structural determinants of glomerular permeability[J]. American Journal of Physiology-Renal Physiology, 2001, 281(4): F579-F596.
46 Smithies O. Why the kidney glomerulus does not clog: a gel permeation/diffusion hypothesis of renal function[J]. Proceedings of the National Academy of Sciences, 2003, 100(7): 4108-4113.
47 Zheng Z, Schmidt-Ott K M, Chua S, et al. A Mendelian locus on chromosome 16 determines susceptibility to doxorubicin nephropathy in the mouse[J]. Proceedings of the National Academy of Sciences, 2005, 102(7): 2502-2507.
48 Zayas C F, Guasch A. Early glomerular dysfunction in human renal allografts[J]. Kidney international, 2001, 60(5): 1938-1947.
49 Zhu W, Chen H, Ge Y, et al. Ultrastructural changes in the glomerular filtration barrier and occurrence of proteinuria in Chinese patients with type 2 diabetic nephropathy[J]. Diabetes research and clinical practice, 2009, 86(3): 199-207.
50 Pätäri‐Sampo A, Ihalmo P, Holthöfer H. Molecular basis of the glomerular filtration: nephrin and the emerging protein complex at the podocyte slit diaphragm[J]. Annals of medicine, 2006, 38(7): 483-492.
51 Beetham R, Cattell W R. Proteinuria: pathophysiology, significance and recommendations for measurement in clinical practice[J]. Annals of clinical biochemistry, 1993, 30(5): 425-434.
Abstract of BEAM Synthetic biology has shown great potential in multiple fields like life science, environmental protection and economics. However, its rapid advancement has also raised ethical concerns, and there is an urgent need to establish an effective risk assessment system to ensure the healthy development of technology. To address this, we have developed the BIO-ETHICAL ASSESSMENT MODEL (BEAM), a novel framework designed for evaluating the ethical risks of synthetic biology techniques. We further applied this model to our ABCS project as a case study, demonstrating its practical value. We innovatively integrated Structural equation modeling (SEM) with Bayesian Network (BN) analysis. SEM uses a one-dimensional measurement approach and maximum likelihood estimation to model causal relationships between risk factors. BN handles uncertainty and probabilistic reasoning through Directed Acyclic Graph (DAG) and Conditional Probability Tables (CPT). BN allows us to calculate a comprehensive risk score for specific synthetic biology events with the help of Netica. We validated the model using stratified sampling and Delphi weighting, and tested BEAM's sensitivity by applying it to representative synthetic biology events. We also provide a preliminary ethical risk interval division. Finally, we applied BEAM to the ABCS, yielding a low ethical risk score of 32.2 and identifying specific risk dimensions for targeted management. BEAM offers a comprehensive and quantitative approach to ethical evaluation in synthetic biology, showcasing strong potential for broad application and dissemination. Its key strengths lie in its evaluation system and the integration of SEM and BN analysis. We hope our model may BEAM the pathway of a well structured and forward-thinking ethical assessment in the age of synthetic biology. Inspiration Synthetic biology brings both opportunities and challenges, reshaping our understanding of life while opening new frontiers for innovation and responsibility. The potential ethical risks associated with synthetic biology have sparked extensive attention and discussion. Ethical, legal, and social issues (ELSI) triggered by synthetic biology has rapidly become a focal point of concern, with researchers from various fields engaging in in-depth discussions. To address the ethical challenges posed by synthetic biology and ensure its responsible development, scholars and policymakers have explored various ethical risk assessment frameworks. Solution-focused risk assessment model (SFRA)[1] proposed by Adam Finkel evaluates the risks and benefits through a series of principle questions. Network sustainability model underscores the importance of broad cooperation to enhance the overall risk governance capacity. Precautionary principal framework emphasizes the priority of establishing ethical frameworks throughout the entire project lifecycle. The ethical governance framework[2] advocated by European Commission's Ethics Advisory Board emphasizes the systematic integration of ethical principles throughout the research and application process of synthetic biology. The frameworks mentioned above have provided guidance for the development of synthetic biology. However, they face several challenges in practice. First, many of them lack clear mechanisms for quantifying ethical and social concerns, which makes results difficult to compare or validate. Second, their implementation often remains conceptual, leaving a gap between theoretical proposals and actionable tools. Third, the heavy reliance on subjective judgments may introduce inconsistency, bias, and limited reproducibility in evaluations. Bayesian Network are commonly used in medical diagnosis, fault detection, and financial risk assessment. It can analyze risks in complex systems through probability theory and graphical structures, revealing interactions among multiple factors. However, it can't comprehensively incorporate all relevant variables. Structural equation model (SEM), widely used in social science, is capable of handling both observed and latent variables inferred from observations. We believe that SEM can compensate the limitations of Bayesian Networks. Therefore, we combine SEM and Bayesian Network to address the shortcomings of existing synthetic biology ethical assessment. Model Description Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) is a statistical method based on the covariance matrix of variables to analyze relationships between variables which is also an important tool for multivariate data analysis. SEM was initially developed by Polish geneticist Waclaw in the 1970s[3], integrating factor analysis, regression analysis, and path analysis, establishing a statistical framework for modeling and testing variable relationships[4]. The core feature of SEM is its ability to handle observed variables and latent variables simultaneously. SEM primarily consists of two parts: measurement relationships (defining the relationship between latent variables and observed variables) and influence relationships[5] (defining the relationships between latent variables). SEM is particularly suitable for addressing complex issues in social sciences and life sciences, such as risk perception, ethical attitudes, and behavior prediction[6-8]. The following chart shows concepts used in SEM.
Concepts of SEM
Concept Explanation
Latent variables It refers to variables that cannot be directly observed or measured, and can only be obtained indirectly.
Observe variables Also known as explicit variables, index variables or measurable variables, they can be obtained through direct measurement or observation, and the data obtained can be directly converted into quantitative data.
Error variable It refers to the error variables that cannot be actually measured and exist in the observation variables in the study, including the random variation part and measurement error in the structural equation model.
Mediating variables It refers to variables with dual attributes of intrinsic variables and external variables, which are not only affected by external variables (attributes are intrinsic variables), but also affect other variables (attributes are external variables).
Direct effect It refers to the effect of the causal variable directly affecting the outcome variable, and the path coefficient of the process is used to evaluate the direct effect.
Indirect effects It refers to the effect of the causal variable indirectly affecting the outcome variable through the mediating variable. When there is only one mediating variable, the product of the two path coefficients is the indirect effect.
Total effect It refers to the total influence of the causal variable on the outcome variable, and the direct effect plus the indirect effect is equal to the total effect.
Through SEM, we can analyze the interactions and causal relationships between different dimensions of ethical risks in synthetic biology projects[19], forming the ideal model path diagram, and obtaining the dimension-dimension or dimension-subdimension path coefficients. Subsequently, we can convert the path coefficients into probabilistic relationships, thereby establishing a Bayesian Network model for the quantitative assessment of ethical risks in synthetic biology technology. Bayesian Network Bayesian Network, also known as a belief network or DAG model, is a statistical model based on probability theory and graph theory aiming for expressing dependencies between variables and inferring about uncertainty. Bayesian Network are commonly used in medical diagnosis, fault detection, and financial risk assessment[20-21]. In BEAM we apply it to our ethical risk assessment model[22-24] to achieve a comprehensive ethical assessment of specific synthetic biology events[25]. A key advantage of Bayesian Networks lies in their ability to decompose a joint probability distribution into a product of conditional probabilities, based on the assumption of local Markovity. \[P\left(X_1,X_2,\cdots,X_n\right)=\displaystyle\prod_{i=1}^nP\left(X_i\mid\mathrm{Parents}\left(X_i\right)\right)\] The key assumption local Markovability formula holds is that each variable \(X_i\) is independent of other non-descendant nodes given its parent node, denoted as \(\mathrm{Parents}\left(X_i\right)\). This decomposition reduces the complexity, and the number of parameters required, making inference and computation more efficient[26]. Two basic structures of the Bayesian Network are directed acyclic graphs and conditional probability tables. Directed Acyclic Graph (DAG) is a graph composed of nodes connected by directed edges. Each edge has a direction indicating a dependency or causal relationship, and the graph contains no cycles—that is, it is impossible to start at any node and follow a sequence of directed edges that eventually loops back to the same node. Conditional Probability Table (CPT) represents the probability distribution of a random variable given its parent variables in a Bayesian Network.
Model Construction Whole map We adopted a systematic and multi-level modeling process. To start with, we conducted structural equation modeling (SEM) to form path structure and estimate the path coefficients between dimensions or sub-dimensions for quantification. The general construction process of SEM mainly includes four steps:
Step 1 Step 2 Step 3 Step 4 Initial model setting Training data collection Model verification Model modification The general construction process of SEM
The initial path structure of SEM is generally obtained through literature review and experience, so we first propose evaluation dimensions and sub-direction based on literature and expert suggestions. Subsequently, to ensure the comprehensiveness and reliability of the dimensions, we designed a Likert scale questionnaire based on the initial model dimensions to collect training data. In the model verification phase, we ensure data quality through data cleaning and reliability analysis, then use Hierarchical Confirmatory Factor Analysis (HCFA) for factor analysis. We adjust the variable structure based on the obtained fit indices and modify the model until a reasonable variable structure is achieved. Finally, we analyze the relationships between the variables of the optimized model and obtain the path coefficients between each dimension and sub-dimensions. Bayesian Networks are the foundation of BEAM's prediction capabilities. The variable relationships obtained from SEM need to be converted into a Bayesian Network through structural mapping and data normalization. Only after that can BEAM predict ethical risk scores for future synthetic biology projects. The two keys for conversion from SEM to Bayesian Networks are:
Converting path diagram into directed acyclic graph (DAG)
Converting path coefficients from SEM into conditional probability table (CPT)
After the modeling process, BEAM can give a prediction score from 0-100 representing the ethical risk of one project. We evaluated 5 synthetic biology events by BEAM for testing. Based on obtained data, we divided the score range into 5 risk levels using the principle of equal interval classification.
Overall flowchart of the BEAM model
Initial model specification During the development of our evaluation framework, our human practice group conducted expert interviews to strengthen the model’s theoretical foundation. Through consultations with experts, we gained valuable insights into how ethical priorities differ across various application scenarios in synthetic biology. These discussions guided us in clarifying and refining the key ethical dimensions of our framework, ensuring that it embodies both theoretical rigor and practical relevance. Building upon these theoretical insights, we further translated the identified ethical dimensions into a structured model. Drawing on the classic theoretical framework of biomedical ethics[9-12], we adopted the dimensional paradigm of Respect–Happiness–Autonomy–Justice from the ethical matrix[13-16] to construct a multidimensional evaluation system for synthetic biology technologies. Through literature analysis and expert consultation, we ultimately identified five core dimensions for our framework: ecological security, biological protection, fairness and justice, public benefit, and regulatory framework[17]. Training Data Collection After determining the main evaluation dimensions and establishing the initial path structure, we need to examine the rationality of the whole initial model. To this end, our human practice group helped design and distribute a questionnaire to the public to collect training data, which was then imported into SEM to test the fit of the initial theoretical model. For the design of the questionnaire, we adopted a unidimensional measurement method, where each question focuses on a specific dimension. In this, we mixed positive and negative items to avoid inertia selection bias. We used a Likert scale as quantitative tool[18]. Likert scale is one of the most commonly used quantitative tools in social science and psychological research, especially suitable for measuring the intensity of attitudes, opinions, or subjective feelings. The questionnaire is set on a 5-point scale, with symmetrical distribution of positive and negative options, where negative options correspond to 1 point and positive options correspond to 5 points as the maximum score. The results of the questionnaire are as follows.
Results of the questionnaire
Main dimension Sub-dimension Average Standard deviation
Ecological security Natural environment 3.841642 0.775105
Public health 3.568498 0.995109
Ecological balance 3.695015 0.911654
Genetic abnormalities 3.57478 0.996861
Natural gene pool 3.888563 0.767468
biological protection Biowar 3.800587 0.879026
Bioweapon 3.302053 1.004119
Technological leakage 4.055718 0.7518
fairness and justice Technological fairness 3.750733 0.867106
Generational fialness 3.322581 0.939741
intellectual property rights 4.293255 0.680421
public benefits Maximize public benefits 3.565782 1.063472
Minimize public harm 3.782991 0.815782
Public acceptance 4.131965 0.733121
Subject respect 2.874269 1.167504
Regulatory framework Ethical evaluation mechanisms 4.090909 0.7518
Technical restriction mechanisms 4.055718 0.7518
Democratic review 3.953079 0.773248
Academic freedom 4.460411 0.680421
Modification Initial model evaluation The training data were imported in statistical analysis software and goodness-of-fit test was performed on the initial theoretical model to calculate various fit indices. Fit indices included chisq/df, CFI, TLI, RMSEA, and SRMR[3], with their standard values sourced from currently recognized indicators. Results are shown in the table below.
Standards of Fit indices
Fit indices Explanation Standard Result
chisq/df The chi-square to degrees of freedom ratio (\(\frac{\chi^2}{df}\)) is below the commonly accepted threshold of 3, indicating a good model fit and strong consistency between the model and the observed data. <3 4.93
CFI The Comparative Fit Index (CFI) exceeds the commonly accepted threshold of 0.90, indicating a good model fit and strong agreement between the hypothesized model and the observed data. ≥0.90 0.954
TLI The Tucker–Lewis Index (TLI) exceeds the commonly accepted threshold of 0.90, indicating a good model fit and that the model adequately captures the data structure. ≥0.90 0.717
RMSEA The Root Mean Square Error of Approximation (RMSEA) falls below the commonly accepted threshold of 0.08, suggesting an acceptable level of model fit and approximation to the population covariance structure. ≤0.08 0.077
SRMR The Standardized Root Mean Square Residual (SRMR) is below the commonly accepted threshold of 0.08, indicating an acceptable model fit and minimal residual differences between the observed and predicted data. ≤0.08 0.121
Initial model presents a complex state of partial fit but overall poor performance. CFI=0.954 and RMSEA=0.077 meet acceptable standards, indicating that the model has some rationality in comparison to the independent model. However, chisq/df=4.93 is too high, TLI=0.717 is far below 0.90, indicating that the model fails to explain sufficient data variation; SRMR=0.121 is also significantly above the threshold of 0.08, indicating a large residual between the predicted covariance of the model and the actual observed covariance. Although CFI and RMSEA meet the standards, the significant failures of the three indicators chisq/df, TLI, and SRMR clearly indicate a serious structural contradiction between the initial theoretical model and the actual data. Therefore, the initial model needs further modification. Initial model reconstruction Based on the fit indices, we revised the initial path structure referring to modification indices and path coefficient p-values, thereby constructing a structural equation model for ethical risk factors. Details of our modification are mentioned as follows.
Path Additions and Constraint Relaxation
We added paths and released unreasonable constraints based on the modification indices. During the modification process, we focused on the suggestions provided by the modification indices (M.I.). Modification indices indicated that adding covariance relationships between certain error terms in the model could significantly reduce the overall chi-square value of the model, thereby improving fit.
  • A covariance path between measurement errors of the observed variable "gene_tech_health_risk" and "bioweapon"
  • A covariance relationship between the error terms of tech_fair and tech_leak.
Covariance relationship between these two pairs of error terms was established to achieve the maximum reduction in chi-square value and improve the accuracy.
Theory-Driven Latent Variable Reconfiguration
We reconstructed latent variables and removed insignificant paths based on factor loadings and theoretical affiliations. To improve the reliability and validity of the model, we removed statistically insignificant observed variables, then reorganized latent variables based on data performance.
  • Variables tech_leak in dimension 'biological protection' has been reclassified into a new dimension "TechGovernance".
  • Variables ip_rights in dimension 'fairness and justice' have been reclassified into dimension "RegFramework".
  • Variables ethics_eval and tech_limit in dimension 'regulatory management' have been reclassified into dimension "TechGovernance".
  • In dimension 'biosafety' variables public_health and variables abnormal_evol were deleted due to low factor loadings.
  • In dimension 'public benefit' variables public_benefit_max and variables subject_respect were removed due to the insignificant path coefficients.
After restructuring, we obtained the optimized model. The key fit indices of the revised model meet or exceed commonly accepted standards, indicating that the final model structure is theoretically sound and empirically adequate in explaining the data. This model clearly reveals the core mechanisms of influence among the various factors related to the ethical risks of synthetic biology. Final Dimensions and Pathways Dimensions explanation Ecological Security Ecological safety refers to a stable and sustainable condition in which an ecosystem provides essential resources and environment for humans, while maintaining capacity to resist or adapt to both natural and anthropogenic disturbances. Its core objective is to maintain the natural foundation for human's survival and development, ensuring that ecosystem stays balanced and not harmed. For dimension of ecological safety, we divided it into 4 sub-dimensions as follows:
Ecological Security Natural Environment Will this syn-biology technique have a negative impact on the natural environment? Ecosystem balance Will this synthetic biology technology disrupt the balance of existing ecosystems? Contamination of the natural gene pool Will the products of this synthetic biology technique contaminate the existing natural gene pool? Genetic abnormalities Is there a risk of abnormal genetic mutations with this synthetic biology technique?
Biological war and Bioweapons Biological war and bioweapons are one of the most severe threats to global public health. They use pathogens like viruses or biotechnology, aiming to cause massive casualties or ecological damage to animals or plants. The grate destructive effect of such weapons has made them a focal point of global governance.
Biological Weapon Risk Biological warfare Will this synthetic biology technique be used in biological warfare? Biological weapon Will this synthetic biologytechnique be used to make biological weapons?
Technology Governance Technology governance refers to the comprehensive organization, coordination and supervision of all activities in field of synthetic biology to ensure the technology develop safely, effectively and sustainably, and maximize its social benefits. We divide technology governance into 5 sub-dimensions, including intergenerational equity, technology leakage, technology limitations, and ethical pre-assessment. Among these, intergenerational equity refers to the equal rights of present and future generations in natural resources and access to development opportunities. Intergenerational equity emphasizes not infecting future generations' ability to meet their own needs while meeting the needs of present generation.
Technology Governance Technological equity Will this synthetic biology technique lead to regional technological inequity? Intergenerational equity Will this synthetic biology technique undermine intergenerational equity? Technology leakage Will the synthetic biology technology lead to or be related to a technology leak? Technology limitation Whether this synthetic biology technique will break the existing technical limitations? Ethical pre-assessment Has this synthetic biology technique been preevaluated for ethics?
Public Acceptance Public acceptance refers to the recognition and acceptance of synthetic biology technology by the public in terms of cognition, attitude and behavior. It involves the public's understanding of the new technology, the assessment of its potential risks and benefits, and whether it is willing to use or support the promotion and application of technology in life, work, etc.
Public Acceptance Minimization of harm to the public Does the synthetic biology technology comply with the principle of minimizing harm to the public? Public acceptance Is the synthetic biology technology easily accessible to the public?
Regulatory Framework Regulatory framework is a management framework covering laws, reviewing mechanisms, academic norms, and multi-departmental collaboration, aiming to ensure the development of synthetic biology technologies under the principle of safety, legality, and ethics. For the dimension of the regulatory framework, we divided it into 4 sub-dimensions, democratic review, legal regulation, academic freedom, and intellectual property rights.
Regulatory Framework Democratic review Has the synthetic biology technology undergone democratic review? Legal regulation Whether the synthetic biology technology violates the law or exploits legal loopholes? Academic freedom Is this synthetic biology technique a misuse of academic freedom? Intellectual property rights Does this synthetic biology technology and application infringe intellectual property rights?
Path coefficients After modification, we recalculated the path coefficients and formed the final ethical risk assessment dimension path diagram.
Path coefficients diagram
Dimension Item Abbreviation Factor Load S.E. C.R.
Eco-Security Natural environment Natural_env 0.733 0.038 19.529
Genetic abnormality Gene_tech_health_risk 0.698 0.045 15.502
Ecosystem balance Eco_balance 0.747 0.038 19.716
Natural gene pool Natural_gene 0.637 0.053 12.113
BioWeaponRisk Biological warfare Biowar 0.835 0.059 14.239
Biological weapon Bioweapon 0.74 0.058 12.761
Technology Governance Technological equity Tech_fair 0.428 0.058 7.381
Intergenerational equity Gen_fair 0.767 0.039 19.878
Technology leakage Tech_leak 0.6 0.051 11.738
Technology limitation Tech_limit 0.431 0.054 7.967
Ethical pre-assessment Ethics_eval 0.759 0.04 19.105
Pubic Acceptance Minimization of harm to the public Public_harm_min 0.602 0.061 9.951
Public acceptance Public_accept 0.729 0.052 14.158
RegFramework Democratic review Democratic_review 0.622 0.049 12.802
Legal regulation Legal_reg 0.854 0.041 20.82
Academic freedom Academic_freedom 0.776 0.043 18.114
Intellectual property rights Ip_rights 0.797 0.042 19.181
Here shows the recalculated path coefficients. Technology governance has a significant positive impact on ethical risks, with a standardized path coefficient \(\beta\) = 1.010 (t = 17.318). The impact of technology governance is the strongest among all, indicating that governance at the technological level (including technical fairness, restrictions, assessments, etc.) is the core factor constituting the perception of ethical risks. Public acceptance has a significant positive impact on ethical risks, with a standardized path coefficient \(\beta\) = 0.911 (t = 17.544). The impact of public acceptance is only second to technology governance, suggesting that the public's judgment strongly affects assessment of ethical risks. The regulatory framework has a significant positive impact on ethical risks, with a standardized path coefficient \(\beta\) = 0.795 (t = 13.981). This indicates that a sound, democratic, and well-ordered regulatory framework can effectively influence public risk perception. There is a significant positive correlation between ecological security and biological weapon risks, with a correlation coefficient of 0.495 (t = 7.258, P < 0.001). Both Ecological security (path coefficient = 0.36) and weapon risks (path coefficient = 0.37) have noticeably lower coefficients than others. When three macro factors 'technology governance', 'public acceptance', and 'regulatory framework' are considered simultaneously, the direct influence of risks in the specific areas like 'ecology' and 'weapons' becomes weakened. Their negative impacts may have already been covered by macro factors. There is a moderate positive correlation between 'ecological security' and 'weapon risks' (coefficient = 0.51). Bayesian Network Modeling After completing the SEM construction, we aimed to use the dimension weights to perform risk inference. At this stage, our human practice group conducted an consultation with Professor Chen Wenbin, whose expertise offered valuable guidance. He emphasized that the key lies in uncovering the causal logic and probabilistic relationships among influencing factors, and further developing a probability-based correlation model. Inspired by his insight, we innovatively proposed converting the path coefficients in Structural Equation Modeling (SEM) into probabilistic causal relationships, to build a Bayesian Network (BN) model for the quantitative assessment of ethical risks in synthetic biology technologies. We first used the Bayesian Network modeling software assistant Netica to construct a directed acyclic graph (DAG). In the structural modeling phase, we based our structural mapping on the path diagram of SEM, establishing path relationships among various assessment dimensions and their sub-dimensions, and transferring them to the Bayesian Network to establish a topological structure with causal directionality. Subsequently, we discretized the continuous variables in the SEM model, converting the influence strengths represented by the standardized path coefficients into discrete states of nodes in the Bayesian Network (e.g., T/F or multi-level probability states). This process not only preserved the logical relationships in the original structure but also equipped the model with the capacity for probabilistic inference and risk prediction. Through this process, we successfully established an ethical assessment model based on the Bayesian Network, whose directed acyclic graph structure faithfully reflects the causal paths of the original SEM model, providing a solid foundation for subsequent dynamic simulation and decision support for ethical risk.
DAG diagram of BEAM
The next step is to determine the conditional probability table (CPT) between the nodes, that is, to define the conditional probability table using the impact factor of SEM.
Single-parent node In the case of a single parent, the path coefficient is directly converted to a probability weight. *If the influence factor of the parent node on the child node is \(\beta\), it will be used as the probability weight.
Multi-parent nodes For multi-parent cases, we do joint impact weighting. First, for multiple parallel parent nodes, we use the \(\mathrm{Softmax}\) function to normalize the path coefficients corresponding to them in the structural equation. Given an input vector \(z=\left[z_1,z_2,\cdots,z_k\right]^T\), the \(\mathrm{Softmax}\) function is defined as \[\mathrm{Softmax}\left(z_i\right)=\dfrac{e^{z_i}}{\sum_{j=1}^Ke^{z_j}}\] \[\text{(Calculated independently for each element)},\] the output is a probability distribution \[\sum_{i=1}^K\mathrm{Softmax}\left(z_i\right)=1.\]
After normalization, we fill in the conditional probability table for all cases for each node. The initial state of the model is as follows:
The initial state of the model
Initial state means that when the probability of ethical hazard in all subdimensions is zero, the probability of ethical hazard in the assessed synthetic biology event is also zero. The central node EthicsRisk ranges from 0 to 100, indicating the probability (unit: %) that a particular event has an ethical hazard. A value of 0 indicates that the event is unlikely to pose an ethical hazard, while a value of 100 indicates that the event must be ethically hazardous. Obviously, the higher the EthicsRisk score, the greater the degree of ethical sensitivity and potential risk of the corresponding event, and vice versa. Model Evaluation To verify the feasibility of BEAM, we selected synthetic biology events that have been widely evaluated in terms of ethics and evaluated quantitatively according to the dimensional mentioned in Dimensions explanation. We collected data through relevant questionnaires for different projects based on the methods mentioned in 4.3 Training Data Collection. We classified population according to their understanding of synthetic biology, and we found that different people have significant differences in their ratings for the same event. Therefore, for the categorical population, we averaged the scores of each group separately then weighs determined by the Delphi method, specifically Very familiar : Somewhat familiar : Not unfamiliar = 0.5 : 0.3 : 0.2. The scores are then averaged again to obtain the final evaluation result. Following part shows several events and their ethical risk score calculated by BEAM.
Reconstruction of the extinct horsepox virus In 2017, a team from the University of Alberta in Canada successfully reconstructed the extinct horsepox virus in the laboratory by synthesizing DNA fragments.
Learn More
Artemisinic acid synthesis pathway in yeast The team at the University of California, Berkeley has introduced the artemisinic acid synthesis pathway into yeast using synthetic biology techniques, achieving efficient production of artemisinin precursors.
Learn More
We have also evaluated other synthetic biology events including mRNA vaccine for COVID-19, CRISPR-Cas9 gene-edited baby and Carbon dioxide synthesis of starch. Model Stratification After applying our model to conduct ethical evaluation of multiple synthetic biology events, we divided the score range from 0 to 100 into five risk levels based on the obtained score data and the principle of equidistant division.
Tip Hover your mouse over the color bar to see what each color represents.
0 20 40 60 80 100
Model Application Usage Instructions When evaluating any synthetic biology event, BEAM can be applied through the following procedure.
Released a questionnaire to quantify the ethical impact Obtain its assessment score in terms of ethical risk Use our software BEAMer or Netica to obtain risk assessment results 1 2 3 Steps for using the BEAM model
Questionnaire Release A questionnaire is distributed to quantify the ethical impact of the event. Each sub-dimension corresponds to a specific question designed to assess related attitudes. It is recommended to use a 10-level Likert scale, with options symmetrically distributed between positive and negative responses. Raw results can be preprocessed as needed (e.g., classification, normalization, or weighting).
Score Calculation The preprocessed questionnaire data are integrated to obtain scores for each dimension, leading to the calculation of an initial ethical risk assessment score.
Inference Method Selection After obtaining the scores, users can choose between two methods to further infer and obtain the final result: Option A: Software Input the questionnaire scores into our software, which provides a built-in interface for probability adjustment and automatic inference. The system computes based on the same causal structure and conditional probability tables, and directly displays the final EthicsRisk score. Option B: Netica Input the questionnaire scores into Netica, adjust the probability distribution of each dimension in compilation mode. Netica will perform Bayesian inference based on the directed acyclic graph and conditional probability tables, outputting EthicsRisk as the final result.
Ethical risk assessment for ABCS After modeling process of BEAM, we evaluated ethical risk of our project ABCS with it. Based on the methods outlined above, we collected data from released questionnaires related to our project. Among all respondents, 6.21% were very familiar with synthetic biology, 51.98% had some understanding, and 41.81% had no understanding at all. The figure below presents the ethical assessment scores for the ABCS across different groups of people in various dimensions. There is a noticeable difference in the scores given by individuals from different levels of knowledge in synthetic biology.
Very familiar Somewhat familiar Not familiar natural environment genetic abnormality ecosystem balance natural gene pool biological warfare biological weapon technological equity technology limitation technology leakage intergenerational equity Public acceptance Minimization of harm to the public intellectual property rights academic freedom legal regulation democratic review The ethical assessment scores for the ABCS across different groups
After averaging the scores for each group separately, the weights of 'very familiar', 'somewhat familiar', and 'not familiar at all' were set as 0.5: 0.3: 0.2. Scores were then weighed average again, with the following results.
score democratic review legal regulation academic freedom intellectual property rights Minimization of harm to the public Public acceptance intergenerational equity technology leakage technology limitation technological equity biological weapon biological warfare natural gene pool ecosystem balance genetic abnormality natural environment 0 10 20 30 40 50 Low Score High Score Final scores in various dimensions
Inputting data above into BEAM, we got final score of 32.2, indicating a medium-low ethical risk.
Evaluation results of ABCS project
According to the evaluation results, ABCS has a high score in the dimensions of technology fairness and intellectual property, showing relatively prominent ethical risks. Complementary Tool: BEAMer To further extend the applicability of the BEAM model, we developed a complementary software tool named BEAMer. By integrating the theoretical framework of BEAM with user-friendly data input and automated inference functions, BEAMer allows users to directly apply the model to real-world synthetic biology projects. This not only facilitates efficient ethical risk assessment but also ensures that the evaluation results can be intuitively interpreted and practically utilized. You can visit our software page to see the details of the application. Strengths and Weaknesses BEAM proposes systematic and all-round ethical evaluation dimension structure, innovatively combines the structural equation model with the Bayesian Network, and obtains multi-source data through questionnaires to ensure the credibility of the assessment results. BEAM not only effectively supplements and expands the existing ethical evaluation system but has also realized the quantitative analysis of the ethical risks for specific synthetic biology events, providing a new path for the interaction between technology development and ethical governance. With the continuous evolution of synthetic biology technology, BEAM is expected to play a key role in more cutting-edge application scenarios and become an important tool to promote the integration and development of science and technology ethics. At present, the risk classification criteria in the BEAM model still have room for improvement. After having a larger dataset, the ethical risk intervals can be refined by applying quantile-based methods, thereby enhancing the accuracy of the classification process. In addition, during the process of establishing evaluation dimensions, we mainly refer to existing literature and discussions with experts. Though our dimension structure is evaluated and modified, there is still room for improvement. We sincerely hope that public, researchers and policymakers participate in the process of promoting the development of BEAM. We believe that these improvements will inject new vitality into the popularization and application of BEAM and help the ethical evaluation of synthetic biology move towards a better future. Conclusion Our work represents a shift from initial qualitative reflections to quantitative practice. Through the integration with human practice, we conducted expert interviews and public surveys to collect diverse perspectives on ethical concerns in synthetic biology. These insights guided the refinement of our evaluation framework, enabling us to transform abstract ethical discussions into measurable dimensions and practical tools. By combining structural equation modeling (SEM) with Bayesian networks, we have constructed a synthetic biology ethics assessment model, BEAM. Through the analysis and assessment of specific synthetic biology events, we have validated the effectiveness of BEAM and successfully applied it to the ethical assessment of our own project, ABCS. Meanwhile, we developed supporting software called BEAMer, so that the model not only remains at the theoretical level but also can assist researchers in quantitatively analyzing and interpreting ethical risks in an intuitive and interactive way. We are willing to further explore and establish standardized methods for ethical assessment in synthetic biology, providing solid ethical support for the sustainable development of synthetic biology. We firmly believe that through the synergistic application of BEAM and BEAMer, and with the promotion of multi-party collaboration, it is hopeful to create a more transparent, inclusive, and responsible ecosystem for innovation in synthetic biology. Reference
Reference
1 Finkel, A. M. (2011). Solution focused risk assessment: a proposal for the fusion of environmental analysis and action. Human and Ecological Risk Assessment, 17(4), 754 787. DOI: 10.1080/10807039.2011.554501.
2 Bohua L, Yuexin W, Yakun O, et al. Ethical framework on risk governance of synthetic biology[J]. Journal of Biosafety and Biosecurity, 2023, 5(2): 45-56.
3 Xin Shibo, Chen Yan, Zhang Chen. A Review of Research Achievements in the Application of Structural Equation Modeling Theory [J]. Industrial Technology and Economics, 2014, 33(05): 61-71.
4 Cheng Kaiming. Characteristics and Applications of Structural Equation Modeling[J]. Statistics and Decision, 2006, (10): 22-25.
5 Harlow L L. Structural equation modeling[M]//The essence of multivariate thinking. Routledge, 2023: 183-201.
6 Wang Bin, Chen Lin, Chai Chengliang, et al. Study on the usage of post-exposure prophylaxis among male homosexuals in college based on structural equation modeling and its related factors [J]. Chinese Journal of AIDS & STD, 2024, 30(11): 1130-1135. DOI:10.13419/j.cnki.aids.2024.11.04.
7 Ono K, Kato E, Tsunemi K. Construction of a structural equation model to identify public acceptance factors for hydrogen refueling stations under the provision of risk and safety information[J]. International Journal of Hydrogen Energy, 2022, 47(75): 31974-31984.
8 Wang Canfei. The current status of spiritual needs of breast cancer patients and the construction of structural equation model for its influencing factors [D]. Jilin University, 2020. DOI:10.27162/d.cnki.gjlin.2020.003555.
9 Huang Cui, Song Donglin, Liang Huigang. Biosafety Issues in Synthetic Biology and Biological Risk Management Countermeasures [J]. China Medicine Biotechnology, 2025, 20(01): 13-18.
10 Hu K, Zhou Q, Zhang Y, et al. Latent profile analysis of nurses’ moral courage: a professional values perspective[J]. Nursing ethics, 2025, 32(2): 678-689.
11 Liao Bohua, Wang Yuexin, Ou Yakun, Zuo Kunlan, Liu Huan, Lei Ruipeng,Ethical framework on risk governance of synthetic biology,Journal of Biosafety and Biosecurity,Volume 5, Issue 2,2023,Pages 45-56,ISSN 2588-9338,https://doi.org/10.1016/j.jobb.2023.03.002.
12 Zhang X, Xiao Z, Zhang T, et al. A quantitative analysis of biosafety and biosecurity using attack trees in low‐to‐moderate risk scenarios: Evidence from iGEM[J]. Risk Analysis, 2024.
13 Biasetti P, Florio D, Gili C, et al. The ethical assessment of touch pools in aquariums by means of the ethical matrix[J]. Journal of Agricultural and Environmental Ethics, 2020, 33(2): 337-353.
14 Adams J. Introducing the ethical-epistemic matrix: a principle-based tool for evaluating artificial intelligence in medicine[J]. AI and Ethics, 2025, 5(3): 2829-2837.
15 Spitale G, Germani F, Biller-Andorno N. The PHERCC matrix. An ethical framework for planning, governing, and evaluating risk and crisis communication in the context of public health emergencies[J]. The American Journal of Bioethics, 2024, 24(4): 67-82.
16 Wu Leqian, Kong Xiangjin. Improvement of the Ethical Matrix: Construction of an Ethical Evaluation Tool for Biomedical Technology[J]. Chinese Journal of Medical Ethics, 2024, 37(08): 877-884.
17 Zhai Xiaomei, Qiu Renzong. Synthetic Biology: Ethical and Governance Issues[J]. Science and Society, 2014, 4(04): 43-52. DOI:10.19524/j.cnki.10-1009/g3.2014.04.005.
18 Qi Laibin. Statistical Analysis and Fuzzy Comprehensive Evaluation of the Likert Scale[J]. Shandong Science, 2006, (02): 18-23+28.
19 Shen Junxian. A Study on the Factors Affecting Public Satisfaction with Government Data Openness Based on Structural Equation Modeling [D]. Xiangtan University, 2023. DOI:10.27426/d.cnki.gxtdu.2023.000918.
20 Zou Hemin. Risk Assessment and System Development for Surrounding Rock Instability in Highway Tunnels Based on Bayesian Networks[J]. Geological Science and Technology Bulletin, 2024, 43(06): 89-101. DOI:10.19509/j.cnki.dzkq.tb20240205.
21 Han Wenqi. Risk assessment of geological disasters in tunnel portal sections based on Bayesian network models [J]. Low Carbon World, 2025, 15(03): 70-72. DOI: 10.16844/j.cnki.cn10-1007/tk.2025.03.031.
22 Jian Jie. Research on Risk Prediction Model for Patients with Coronary Heart Disease Complicated by Atrial Fibrillation Based on Bayesian Network[D]. Chongqing Medical University, 2024. DOI:10.27674/d.cnki.gcyku.2024.001917.
23 Kwag S, Gupta A, Dinh N. Probabilistic risk assessment based model validation method using Bayesian network[J]. Reliability Engineering & System Safety, 2018, 169: 380-393.
24 George P G, Renjith V R. Evolution of safety and security risk assessment methodologies towards the use of bayesian networks in process industries[J]. Process Safety and Environmental Protection, 2021, 149: 758-775.
25 Malekmohammadi B, Uvo C B, Moghadam N T, et al. Environmental risk assessment of wetland ecosystems using Bayesian belief networks[J]. Hydrology, 2023, 10(1): 16.
26 Marcot B G, Penman T D. Advances in Bayesian network modelling: Integration of modelling technologies[J]. Environmental modelling & software, 2019, 111: 386-393.
Download
Drag to move the image · Scroll to scale