Model

1. Sample Information

This study utilized MCF-7 and MDA-MB-231 cell lines to conduct experiments across groups with varying plasmid concentrations: Control, 1 μg, 2 μg, and 4 μg.

In the CCK8 assay, each plasmid concentration group included five replicates, with CCK8 levels measured at 0, 24, and 48 hours.

In the wound healing assay, each group included three replicates to assess cell migration.

Similarly, in the Transwell assay, each group included three replicates to evaluate cell migration capacity.

2. Analytical Methods

2.1 Data Distribution Analysis

Following data preprocessing, box plots were generated using the ggboxplot function from the ggpubr package to visualize variations in CCK8 proliferation, wound healing migration, and Transwell migration across plasmid concentration groups. The analysis was conducted using the following code snippets:

2.1.1 CCK8 Assay Plotting Code

ggboxplot(data, x="timing",y="value", color="treatment", add="jitter", ylab = "CCK8 Cell proliferation level", xlab = "timing", main="CCK8 experiment of MSF7 cell line", ylim=c(0,700))

ggboxplot(data, x="timing",y="value", color="treatment", add="jitter", ylab = "CCK8 Cell proliferation level", xlab = "timing", main="CCK8 experiment of MDA-MB-231 cell line", ylim=c(0,700))

2.1.2 Wound Healing Assay Plotting Code

ggboxplot(data, x="celltype",y="value", color="treatment", add="jitter", ylab = "Wound healing Cell migration level", xlab = "cell type", main="Wound healing experiment of two cell line")

2.1.3 Transwell Assay Plotting Code

ggboxplot(data, x="celltype",y="value", color="treatment", add="jitter", ylab = "Transwell Cell migration level", xlab = "Cell type", main="Transwell experiment of two cell line")

2.2 Model Construction

The modeling process was divided into four steps: 1) Calculation of CCK8 time series; 2) Integration of static data; 3) Machine learning model construction; 4) Model evaluation.

2.2.1 CCK8 Time Series Calculation

Since the CCK8 data in this project includes measurements at 24 and 48 hours, this study first analyzed the relationship between cell proliferation and time, and extracted temporal dynamic features. Specifically, the proliferation rate, the area under the curve (AUC) value, and maximum growth value were calculated for subsequent analysis. The analysis was conducted using the following code:

#first step

dynamic_features <- data1 %>%

group_by(SampleID, Concentration) %>%

summarise(

Slope_0_48 = (value[timing==48] - value[timing==0])/48, # proliferation rate

AUC = MESS::auc(timing, value), # AUC value

Max_Growth = max(value[timing %in% c(24,48)]), # Max growth value

.groups = 'drop'

)

2.2.2 Integration of Static Data

Since the CCK8 assay included five replicate samples, while the wound healing and Transwell assays included only three, only the first three replicates of CCK8 assay were included in the analysis. After extracting the data from the wound healing and Transwell assays, the temporal dynamic features of CCK8 assay from the previous step were merged using the merge function based on sample IDs, and stored as the full_data object. In this step, the factor function was used to convert plasmid concentration groups into categorical variables to facilitate subsequent analysis. The analysis was conducted using the following code:

data2 = read.csv("01figure/transwell.csv",header=T,row.names=1)

data3 = read.csv("01figure/woundhealing.csv",header=T,row.names=1)

other_data=data.frame(Transwell=data2$value[13:24], WH=data3$value[1:12], Treatment = data2$treatment)

other_data$SampleID = c(1:12)

other_data$Concentration = c(rep(0,3),rep(1,3),rep(2,3),rep(4,3))

other_data$Treatment = factor(other_data$Treatment, levels=c("Control","Ab=1","Ab=2","Ab=4"))

full_data = merge(dynamic_features,

distinct(other_data[,c("SampleID","WH","Transwell","Treatment")]),

by="SampleID")

saveRDS(full_data,"01figure/train_data.RData")

2.2.3 Construction of the Machine Learning Model

In this study, a machine learning approach was employed to model the relationship between plasmid concentration and experimental measurements. First, the data from the MDA-MB-231 cell line, after undergoing the aforementioned preprocessing steps, was designated as the training set, while the processed data from the MCF-7 cell line was designated as the test set.

Second, a random forest algorithm was selected to evaluate the contribution of each variable to the model, including proliferation rate (Slope_0_48), area under the curve (AUC), maximum growth (Max_Growth), wound healing assay results (WH), and Transwell assay results (Transwell).

Third, the model was trained on the training set and subsequently validated using the test set data. The final model was stored as “rf_final.rda”.

The analysis was conducted using the following code:

#second step

effit = train(

Concentration ~ Slope_0_48 + AUC + Max_Growth + WH + Transwell,

data = full_data,

method = "rf",

trControl = train_control,

verbose=FALSE

)

ggplot(varImp(effit)) + theme_minimal()

#third step

if(file.exists("01figure/rf_final.rda")){

rf_final <- readRDS("01figure/rf_final.rda")

} else {

trControl <- trainControl(method="none", classProbs=T)

set.seed(1516)

rf_final <- train(

Concentration ~ Slope_0_48 + AUC + Max_Growth + WH + Transwell,

data = full_data,

method="rf",

tuneGrid = effit$bestTune,

trControl=trControl)

saveRDS(rf_final, "01figure/rf_final.rda")

}

2.2.4 Evaluation of Model Accuracy

The reliability of the model was assessed using the ROC (Receiver Operating Characteristic) curve generated with the roc function from the pROC package. Visualization of the ROC curve was performed using the ggplot function from the ggplot2 package. The analysis was conducted using the following code:

prediction_prob = predict(rf_final, newdata=test_data, type="prob")

library(pROC)

roc <- roc(test_data, prediction_prob[,1])

roc

ROC_data <- data.frame(FPR=1-roc$specificities, TPR=roc$sensitivities)

ROC_data <- ROC_data[order(ROC_data$FPR),]

p=ggplot(data=ROC_data, mapping=aes(x=FPR, y=TPR))+

geom_step(fill="blue", size=1, direction="mid")+

geom_segment(aes(x=0, xend=1, y=0, yend=1))+ theme_classic()+

xlab("Specificity")+

ylab("Sensitivity")+

coord_fixed(1)+

xlim(0,1)+

ylim(0,1)+

annotate('text',x=0.5, y=0.25, label=paste("AUC:", round(roc$auc,2)))

3. Results

3.1 Data Distribution

After recording and organizing the experimental data, box plots were used to visualize the distribution of values. In the MCF-7 cell line, CCK8 cell proliferation levels increased over time. At 24 hours, there was no significant difference among plasmid concentration groups; however, at 48 hours, the 1 μg plasmid group showed significantly lower CCK8 levels compared to the other groups (Figure 1A). A similar trend was observed in the MDA-MB-231 cell line, where at 48 hours, plasmid-treated groups exhibited significantly lower CCK8 levels than the control group (Figure 1B). These findings suggest that the plasmid exerts an inhibitory effect on cell proliferation in both MCF-7 and MDA-MB-231 cell lines, with more pronounced effects at 48 hours.

In the wound healing assay, all plasmid-treated groups showed significantly reduced cell migration compared to the control group, consistent across both MCF-7 and MDA-MB-231 cell lines (Figure 2). Higher plasmid concentrations demonstrated stronger inhibitory effects on migration. Notably, in the MCF-7 cell line, even the 1 μg plasmid group showed substantial inhibition, whereas in the MDA-MB-231 cell line, the 2 μg and 4 μg plasmid groups exhibited similar levels of suppression.

The Transwell assay results were largely consistent with those of the wound healing assay. In both MCF-7 and MDA-MB-231 cell lines, all plasmid-treated groups exhibited significantly reduced migration compared to the control group (Figure 3). Moreover, the inhibitory effect of the plasmid on cell migration appeared to follow a concentration-dependent gradient.

Figure 1. CCK8 cell proliferation levels at 0, 24, and 48 hours across four plasmid concentration groups. (A) CCK8 levels in the MCF-7 cell line; (B) CCK8 levels in the MDA-MB-231 cell line.

Figure 2. Cell migration levels in the wound healing assay across four plasmid concentration groups.

Figure 3. Cell migration levels in the Transwell assay across four plasmid concentration groups.

3.2 Data Modeling and Analysis

In this study, a random forest machine learning model was used to analyze the data. The results indicated that Transwell cell migration levels and proliferation rate (Slope_0_48) contributed the most to the model (greater than 50%), while wound healing migration levels (WH) and maximum growth (Max_Growth) had lower contributions (less than 25%). These findings reflect differing levels of contribution to the dependent variable (plasmid concentration). After we constructed the model, we stored it as an “rf_final.rda” file in R, which can be called elsewhere.

After constructing the model, its performance was evaluated using a ROC (Receiver Operating Characteristic) curve. The area under the curve (AUC) value was 0.771, suggesting that the model possesses moderate predictive power. Additionally, it is hypothesized that heterogeneity between the MCF-7 and MDA-MB-231 cell lines may have introduced variability in the experimental outcomes.

Model

1. Sample Information

2. Analytical Methods

3. Results

Figure 1. CCK8 cell proliferation levels at 0, 24, and 48 hours across four plasmid concentration groups. (A) CCK8 levels in the MCF-7 cell line; (B) CCK8 levels in the MDA-MB-231 cell line.

Figure 2. Cell migration levels in the wound healing assay across four plasmid concentration groups.

Figure 3. Cell migration levels in the Transwell assay across four plasmid concentration groups.

Figure 4. Relative contributions of five predictor variables to the outcome variable in the random forest machine learning model.

Figure 5. ROC curve of the random forest model.

PROJECT

EXPERIMENT

HUMAN PRACTICES

Contact Us