Model | NMU-China

Overview

Model Overview

The focus of our project is to develop innovative engineered bacteria therapies aimed at effectively treating the health hazards caused by microplastic pollution in the human intestinal tract. To achieve this goal, we have constructed a series of interrelated and highly integrated model systems, covering simulation models, metabolic assessment models, symbiotic system models, and therapeutic target prediction models. These models jointly optimize the delivery process, metabolic compatibility, and therapeutic efficacy of engineered bacteria, and deeply explore their clinical application value.

Firstly, to address the complex movement behavior of oral engineered bacteria preparations in the gastrointestinal tract, we have built a refined intestinal drug delivery kinetics simulation model based on COMSOL 6.3 (Simulation Part). This model couples the integrated structure of the stomach, pylorus, duodenum, and jejunum, and uses the moving mesh technique to simulate the synergistic effect of peristalsis and segmentation, revealing the dual regulatory mechanism where the pyloric gate determines the migration "jump distance" and segmentation movement dominates the near-wall mixing. This forward-looking simulation provides quantitative basis for the dose optimization and action site prediction of engineered bacteria, ensuring that subsequent treatment plans can precisely adapt to different physiological states.

Subsequently, to accurately assess the impact of exogenous gene expression on host cells, we have constructed a systematic metabolic toxicity assessment model (Metabolic Part). Using COBRApy to conduct metabolic flux analysis on Saccharomyces cerevisiae with different functional modules introduced, we quantified the energy consumption for the synthesis and secretion of nine exogenous proteins. Sensitivity analysis indicated that protein length is the most sensitive parameter for energy consumption, verifying that the host growth rate remains above 90% of the baseline at the conventional expression level, ensuring the metabolic compatibility and safety of the engineered circuits.

Based on this, we further refined the α-factor quorum sensing dynamics model (Degradation Part), which systematically depicts the density-dependent gene expression regulation mechanism within the ODE framework. The model reveals the dual characteristics of "self-amplification - self-limiting homeostasis", providing a kinetic basis for the precise initiation of engineered bacteria. Meanwhile, for the realization of microplastic degradation function, we systematically analyzed the MXI1-mediated transcriptional inhibition circuit through bioinformatics methods, providing theoretical support for the promoter regulation mechanism.

To fully leverage the colonization and functional advantages of engineered bacteria in the intestinal environment, we introduced a metabolic coupling symbiosis model (Symbiosis Part). This model integrates FBA and FVA analysis to simulate the dynamic interaction between Saccharomyces cerevisiae and Lactobacillus plantarum. The study revealed the bidirectional transfer pathways of key metabolites such as lactic acid and amino acids, quantifying the "symbiotic enhancement - energy relief" coupling relationship. The results demonstrated that the symbiotic system significantly increased the growth rate of yeast and reduced the ATP maintenance burden, providing a solid theoretical basis for the design of multi-strain synergy.

Finally, with an eye on precise treatment, we developed a core target and pathogenic mechanism prediction model (Therapy Part). This model integrates network toxicology, machine learning, molecular docking, and dynamic simulation techniques to systematically identify three core molecular targets: JAK2, IL2, and TGFBR2, and clarifies the four major mechanisms by which PET microplastics affect the progression of IBD through oxidative stress, immune dysregulation, uncontrolled cell proliferation, and intestinal barrier disruption. This multi-level analysis not only deepens the understanding of the pathogenic mechanism of microplastics but also provides new approaches for the precise treatment of IBD and early warning for high-risk populations, broadening the clinical application prospects and future commercialization directions of engineered bacteria therapies.

Part1

Model Overview and Core Assumptions

Model Introduction

Geometric and Motion Assumptions

Fluid and Transport Assumptions

Numerical Calculation

Core Physical Parameters

Laminar Flow

Gastric Peristalsis

Pylorus

Jejunal Motion

Dilute Substance Transport

Boundary Ordinary Differential and Algebraic Equations

Results and Analysis

Hydrodynamic Characteristics of the Model

Transport Behavior of Dilute Substances

Wall Adsorption Kinetics

Appendix A

Appendix B

Numerical Study on Convection-Diffusion and Wall Adsorption of Products in the Small Intestine under the Synergy of Peristalsis and Segmentation

1. Model Overview and Core Assumptions

This model is built based on COMSOL Multiphysics 6.3, focusing on the small intestine (including the duodenum and jejunum), stomach, and pyloric region. It aims to establish a comprehensive numerical model for simulating peristaltic wave motion, segmental motion, fluid flow, and transport of dilute substances (product bacterial solution) in the human small intestine over a simulation duration of 600 seconds. Through multi-physics coupling (moving mesh, laminar flow, dilute substance transport, and boundary ordinary differential equations), the model quantifies the convection-diffusion behavior of the product bacterial solution in the digestive tract and its adhesion characteristics to the intestinal wall. The core assumptions are as follows:

1.1 Model Introduction

As the core site for digestion and absorption of nutrients in the human digestive system, the peristaltic and segmental motions of the small intestine play a crucial role in promoting the mixing of intestinal contents, accelerating substance transport, and improving absorption efficiency. Accurate simulation of the dynamic environment of the small intestine and the product transport process provides guidance for predicting product usage and action sites. This study develops a multi-physics coupled numerical model of the human small intestine. The specific objectives are:
(1) Construct a geometric model of the small intestine (including the duodenum, jejunum, and pylorus) based on physiological parameters to simulate the synergistic effect of peristaltic waves and segmental motions;
(2) Integrate fluid flow, dilute substance transport, and wall adsorption-desorption kinetics into the model to realize coupled simulation of multi-physical processes;
(3) Verify the reliability of the model through theoretical analysis and comparison with indirect experimental data, providing a basis for subsequent studies on bacterial colony adhesion and microplastic degradation.

1.2 Geometric and Motion Assumptions

(1) Axisymmetric Simplification: The digestive tract (stomach, pylorus, duodenum, jejunum) is constructed as a 2D axisymmetric model, ignoring circumferential (φ-direction) changes and only considering radial (r) and axial (z) motions and transport.

(2) Peristaltic Wave Simplification: The radial contraction of peristaltic waves in the jejunum is described using a Gaussian distribution function. The gastric peristaltic waves are simulated by superimposing a sine-squared function with a smooth step function. The opening and closing of the pylorus and changes in the duodenal radius are achieved through piecewise functions to ensure smooth spatial and temporal transitions.

(3) Segmentation Wave Assumption: The segmentation of substances in the jejunum and duodenum is realized through wall displacement in the form of a sine wave, and the segmentation region is limited to a specific axial range (constrained by a smooth function).

1.3 Fluid and Transport Assumptions

(1) Incompressible Laminar Flow: The fluid density (ρ = 1040 kg/m³) and dynamic viscosity (μ = 0.0014 Pa·s) are constant^[3]. The flow satisfies the Navier-Stokes equations, and turbulence (turbulence model is turned off) and gravity effects are ignored.

(2) Dilute Substance Assumption: The initial concentration of the product bacterial solution is low (initial concentration C₀ = 1 mol/m³), which does not affect the fluid viscosity and density. Substance transport only considers convection and molecular diffusion (turbulent diffusion is ignored). Since Colony-Forming Units (CFU) are not the dependent variable unit used in COMSOL, mol is used instead of CFU in this study. This is only a difference in measurement units, and there is no numerical difference between [mol/m³] and [CFU/m³] in numerical calculations.

(3) Wall Adhesion Kinetics: The adsorption-desorption process of the bacterial solution on the intestinal wall conforms to the Langmuir adsorption model, and the desorption rate is enhanced by the wall shear stress (regulated by the shear sensitivity coefficient α).

1.4 Numerical Calculation

(1) Moving Mesh Compatibility:The computational mesh required for solving all control equations of the fluid flow model is created using the built-in mesh generation code in COMSOL Multiphysics® software. The mesh deforms synchronously with the wall peristalsis, and the "deformed domain" physics field is used to update the mesh displacement. The mesh smoothing type is Yeoh, and the boundary layer mesh is refined to capture the wall shear effect. There are 16,972 mesh vertices, including 22,279 triangular elements, 4,578 quadrilateral elements, and 26,857 domain elements.

(2) Time Step and Convergence:An adaptive time step scheme is adopted to automatically adjust the solver time step size to maintain an absolute tolerance of ≤ 0.001. The total duration of the transient calculation is 600 s, and the output results are stored with a time step of 0.1 s. The PARDISO linear solver is used, and Anderson acceleration is applied for nonlinear iterations to stabilize convergence.

(3) Physics Field Coupling:Four physics fields are set up: laminar flow, boundary ordinary differential and algebraic equations (describing the adsorption-desorption process), dilute substance transport, and moving mesh. All four physics fields are involved in coupling during numerical calculations.

2. Global Definitions and Model Parameters

2.1 Core Physical Parameters

This numerical model is constructed using COMSOL Multiphysics® 6.3 software, covering four core physics fields: moving mesh (simulating intestinal wall motion), laminar flow (simulating fluid flow), dilute substance transport (simulating substance diffusion and convection), and boundary ordinary differential equations (simulating wall adsorption-desorption kinetics). The model adopts 2D axisymmetric geometric simplification (ignoring circumferential changes) to balance computational efficiency and simulation accuracy. The key parameters are detailed in the following tables.

Parameters for Geometry, Fluid Properties, Laminar Flow, Moving Mesh, and Dilute Substance Transport

Name	Expression	Unit	Value	Description
L	1.225	[m]	1.225 m	Tube length (changed to 1.80 [m] for the ileum)
rho	1040	[kg/m^3]	1040 kg/m³	Fluid density
mu	0.0014	[Pa*s]	0.0014 Pa·s	Dynamic viscosity
amp	0.73	-	0.73	Relative contraction amplitude
tc	7	[s]	7 s	Period
ta	1.75	[s]	1.75 s	Contraction gating start time
tb	5.25	[s]	5.25 s	Relaxation gating start time
thalf	1.75	[s]	1.75 s	Gating smoothing time
z0	L/2	[m]	0.6125	Peristalsis center (0.90 [m] for the ileum)
up	7.50E-04	[m/s]	7.5E-4 m/s	Peristaltic propulsion velocity
sigma2	2.00E-04	[m^2]	2E-4 m²	Gaussian variance (= 2 cm²)
R0	0.00865	[m]	0.00865 m	Intestinal radius
zc	0.05*L	-	0.06125 m	Axial position z of the bolus center
rc	0	[m]	0 m	Radial position r of the bolus center (m); set to 0 when placed near the axis
az	0.02	[m]	0.02 m	Semi-axis of the bolus in the z-direction (m)
ar	0.8*R0	-	0.00692 m	Semi-axis of the bolus in the r-direction (m)
sm	0.5*min(az,ar)	-	0.00346 m	Smoothing half-width to avoid oscillations caused by hard edges
lambda	0.07	[m]	0.07 m	Segmentation wave wavelength
xigema	0.25*R0	-	0.0021625 m	Segmentation wave amplitude
f	10	[1/min]	0.16667 1/s	Segmentation frequency
a	8.5*lambda-d	-	0.59 m	Segmentation wave smoothing start point
b	9*lambda+d	-	0.635 m	Segmentation wave smoothing end point
d	0.005	[m]	0.005 m	Segmentation smoothing transition half-width
dPdz	-0.04	-	-0.04	Target axial pressure drop [Pa·s]
target	5.64E-08	[m^3]	5.6415E-8 m³	Target cumulative volume per second
L1	0.245	[m]	0.245 m	Duodenal length
L2	0.159	[m]	0.159 m	Stomach length
R1	0.0386	[m]	0.0386 m	Stomach fundus width
L1.5	0.015	[m]	0.015 m	Pyloric region length
A	0.9	[cm]	0.009 m	Gastric peristaltic wave amplitude
v	2.34	[mm/s]	0.00234 m/s	Gastric peristaltic wave propagation velocity
L4	2.04	[cm]	0.0204 m	Gastric peristaltic wave width
delta	0.1*L4	-	0.00204 m	Spatial smoothing parameter, L is half the period length
delta_t	0.5	-	0.5	Temporal smoothing parameter
tanth	L2/(R1-R0)	-	5.3088	Tangent value
sinth	abs(tanth)/sqrt(1+tanth^2)	-	0.98272	Sine value
R2	1.05	[cm]	0.0105 m	Duodenal radius
t0	0	-	0	Time when the duodenum starts to change
tau_on	2	[s]	2 s	Temporal smoothing activation duration
zw	0.05*L1	-	0.01225 m	Spatial transition length at both ends
L3	0.002	[m]	0.002 m	Thin domain length

Adhesion Parameters

Name	Expression	Unit	Value	Description
D	1.00E-13	[m^2/s]	1E-13 m²/s	Bulk diffusion coefficient
alpha	1.00E-07	[1/Pa]	1E-7 1/Pa	Shear sensitivity coefficient
kon	1.00E-05	[m/s]	1E-5 m/s	Adsorption coefficient
koff0	1.00E-04	[1/s]	1E-4 1/s	Basal desorption rate
Gmax	1.00E-05	[mol/m^2]	1E-5 mol/m²	Maximum adsorption capacity
C0	1	[mol/m^3]	1 mol/m³	Initial concentration in the bolus

Geometric Model

For numerical purposes, the geometry of the small intestine is assumed to be a perfect cylinder with flexible walls, and the entire geometry is assumed to be filled with liquid intestinal contents. The flow in the cylindrical tube is assumed to be axisymmetric, and the gastric antrum is assumed to have a funnel-shaped feature. All geometric parameters are based on real physiological data. The geometric model includes four parts: the gastric antrum (only including the starting point of gastric peristalsis to the pylorus), the pylorus, the duodenum, and the jejunum (the ileum model is constructed separately for comparison). The "Form Union" operation in COMSOL is used to construct a unified 2D axisymmetric computational domain^[1][3][5].

(1) Gastric Antrum: The diameter of the upper surface is 1.73 cm, the diameter of the lower surface is 7.71 cm, and the length is 15.9 cm. Its volume in 3D is consistent with the typical volume of the gastric antrum.

(2) Pyloric Region: The diameter is 1.73 cm, and the length is 1.5 cm.

(3) Duodenum: The diameter in the relaxed state is 2.1 cm, and the length is 24.5 cm.

(4) Jejunum: The diameter is 1.73 cm, and the length is 120 cm^[1][3][5].

Fig 1. 2D and 3D Axisymmetric Model Geometry (1 - Stomach, 2 - Pylorus, 3 - Duodenum, 4 - Jejunum)

Fig 2. Ideal Gastric Geometry^[1]

2.2 Laminar Flow

Model Assumptions

The intestinal contents are assumed to be incompressible Newtonian fluids. Since the expected Reynolds number (Re) is ~3, the velocity field of the intestinal contents in the numerical models of the jejunum and ileum generated by peristaltic waves is assumed to be laminar. The intestinal contents are assumed to be Newtonian fluids with a dynamic viscosity of 0.0014 Pa·s and a density of 1040 kg/m³^[3]. Due to the complex spatial position of the small intestine in the human body, the effect of gravity is relatively small compared to the peristaltic driving force, so the effect of gravity is ignored. The flow is axisymmetric, and the circumferential (φ) velocity component and concentration gradient are zero.

Governing Equations of the Laminar Flow Model

The continuity equation and Navier-Stokes equations are used to solve the laminar fluid flow in the entire domain. The opening side of the gastric antrum and the end of the jejunum are set as open boundaries. A moving boundary condition is applied to all boundaries representing the digestive tract wall except the axis of symmetry boundary.

(1) Navier-Stokes Equations

The Navier-Stokes equations solve the gradients of velocity with respect to pressure, viscous forces, and gravity.In Equation，t is time, p is pressure (Pa), μ is dynamic viscosity (Pa·s), ∇u is the velocity gradient tensor, (∇u)t is the transpose of ∇u (s⁻¹), and g is the gravity vector (m/s²). In this study, the effect of gravity on the flow is ignored. The intestinal contents are assumed to be Newtonian fluids with a dynamic viscosity (μ) of 0.0014 Pa·s and a density (ρ) of 1040 kg/m³.

(2) Open Boundary Condition

The open edges at both ends of the axisymmetric geometry are defined as open boundaries to describe the boundaries in contact with a large amount of fluid. The given open boundary condition is applied to these two edges, indicating that the sum of pressure and viscous forces is zero. The fluid can flow freely in and out of the domain without external stress. In the equation, represents the boundary normal pointing outward of the domain.

(3) Wall Condition

This is the most commonly used wall condition in fluid mechanics. It assumes that at the contact surface between the fluid and the solid wall, the relative velocity of the fluid is zero. That is, the fluid molecules close to the wall will "stick" to the wall without relative sliding with the wall.

Moving Mesh

The dynamic mesh module is used to simulate the deformation of the digestive system wall caused by peristalsis and segmentation. The motion modes simulated in the model include: gastric peristalsis, intermittent opening and closing of the pylorus and gastric emptying, duodenal segmentation, jejunal peristalsis, and jejunal segmentation.

2.3 Gastric Peristalsis

Introduction to Gastric Peristalsis

In general, the motion pattern of the gastric wall can be characterized by two types of muscle contractions, as described below. The first type of motor activity originates from the upper part of the stomach and propagates. It is characterized by slow and weak wave activity, which only produces slight indentations in the gastric wall. The second type of motor activity is characterized by a series of regular antral contraction waves (ACWs) that originate from the middle of the stomach and propagate circumferentially toward the pylorus. These contractions are born as shallow indentations but deepen as they move downward (nearly blocking the gastric lumen when they approach the pylorus). Their frequency is approximately 3 cycles per minute, and they are controlled by electrical stimulation generated by the gastric pacemaker, which is located in the middle body region of the greater curvature of the stomach.

In the current modeling stage, we have been able to identify most of the parameters based on literature data. The peristaltic pattern of the gastric wall is characterized as follows: ACWs are initiated every 20 seconds at a location 15.9 cm from the pylorus, with a lifespan of 61 seconds^[1].

Fig 3. Schematic Diagram of Gastric Peristaltic Waves

The relative occlusion degree is defined as RO=A/D, where D is the distance from the gastric wall to the central axis, and A is the amplitude of the gastric peristaltic wave. The variation of RO is specified as follows: from 0% to 40% linearly from 0 s to 19.5 s, remains constant at 40% from 19.5 s to 35.5 s, increases linearly again from 35.5 s to 60 s, reaches a maximum of 80% when advancing toward the pylorus, and finally decreases to 0 gradually from 60 s to 61 s.

Governing Equations of Gastric Peristalsis
(1) The generation of gastric peristaltic waves is controlled by setting the displacement of the gastric wall. The normal displacement of the gastric wall is:

L4 is the wavelength of a single peristaltic wave, Δ is the wave interval with a value of 20 s,v is the wave velocity, and k is the wave number.

Fig 4. The four figures are respectively the Aof - wave amplitude function, the Sa - occlusion degree function, the spatiotemporal diagram of gastric peristaltic waves, and the schematic diagram of 2D planar gastric peristaltic waves. It can be seen that the amplitude of the peristaltic wave reaches the maximum in the middle of the gastric antrum.

2.4 Pylorus

Introduction to Pyloric Mechanism
As the outlet of the stomach, under the set parameters, the motor activities of the gastric antrum, gastric outlet, and duodenum are synchronized. The pylorus acts as a "soft valve" that opens and closes periodically over time, acting only in a short axial range (pyloric segment). When the valve contracts, the effective radius of this pyloric segment decreases, almost blocking the passage from the stomach to the duodenum; when the valve relaxes, the radius returns to a larger value, and the gastric contents flow out in a short period of time. This cycle repeats, forming a typical "stepwise" gastric emptying rhythm. The pyloric cycle is divided into 4 phases:
(1) Contraction Rising Edge: Gradually shrinks from fully open to the target minimum radius.
(2) Contraction Holding Phase: Maintains a small radius for a long time (the valve is "almost closed"), with minimal emptying and strong inhibition of reflux.
(3) Relaxation Falling Edge: Quickly returns from a small radius to a larger radius (fast opening).
(4) Relaxation Holding Phase: Maintains an open state for a short time, allowing a wave of contents to pass through quickly.

Fig 5. The inhibitory effect of the pylorus on reflux between the stomach and duodenum in the 2D plane. Streamlines of fluid flow in the planar region of the stomach at t = 406.2 s, arrows indicate the direction of velocity, and streamlines are colored by velocity magnitude (m/s).

Fig 6. The z-component velocity field of the model at t = 499 s. It can be seen that there is an obvious positive large-velocity flow at the pylorus. The results shown by the model are consistent with the classic function of physiological gastric emptying.

Governing Equations of the Pylorus We model the pylorus as a local radius modulation with a length of L1.5, expressed by a sine-squared function. Its instantaneous radius is simulated by defining the movement of the boundary mesh to realize the opening and closing of the pylorus:

The contraction ratio c(a) and its smooth step are:

The axial spatial weight is:

The initial state of the pylorus is relaxed, with a period of 20 s: it contracts gradually from 0 to 4 seconds, maintains contraction from 4 to 12 seconds, relaxes gradually from 12 to 18 seconds, and maintains relaxation from 18 to 20 seconds. Under the set parameters, the motor activities of the gastric antrum, gastric outlet, and duodenum are synchronized^[2]. The geometry of the computational domain changes periodically.

Fig 7. Spatiotemporal Diagram of Pyloric Gating

2.5 Jejunal Motion

Jejunal Peristaltic Waves
Three different motion patterns are observed in the small intestine^[6] to promote food digestion and absorption. Peristalsis is responsible for the propulsion of chyme along the intestine; segmental contractions involve local contraction and relaxation without overall propulsion along the intestine, and the model studies their role in the mixing of chyme with surrounding fluids; pendular contractions involve the "back-and-forth" motion of the intestinal wall. The model mainly simulates the peristalsis and segmental contraction processes in the small intestine.
Governing Equations of Jejunal Peristaltic Waves
The peristaltic waves generated in the jejunum are simulated by defining the displacement of the intestinal wall. The jejunal peristaltic parameters are modeled as a Gaussian distribution function. The intestinal wall displacement and the step function used are defined as follows:

The relative amplitude amp of the peristaltic curve and the contraction period tc are determined based on literature data^[3].

Fig 8. Shows a complete typical peristaltic wave in the 2D axisymmetric plane and the change in the z-component velocity field caused by peristalsis.

Segmental Motion of the Jejunum and Duodenum Segmental motion refers to the quasi-steady, nearly local reciprocating contraction of the small intestinal wall on a short axial scale^[5]. Its main functions are to stir and mix the luminal contents and enhance mass transfer near the wall. Unlike peristaltic waves that mainly promote propulsion, segmental motion contributes little to the net propulsive flow but can significantly affect local shear, boundary layer thickness, and absorption/reaction environment. In the model, segmentation only acts in a preset short axial interval, and segmentation is regarded as a small radial displacement (the negative sign indicates contraction); its spatial shape is described by a sine function in the axial direction and a harmonic function in time; the intensity is clipped by a soft window to ensure smooth transitions at the ends and avoid numerical oscillations. In the model settings, segmental motion occurs simultaneously in the jejunum and duodenum, while peristaltic waves only occur in the jejunum.
(1) Governing Equations of Segmental Motion
Similar to peristaltic waves, the governing equation of segmental motion is the wall displacement defined on the wall. Segmental motion is regarded as a small radial displacement field oscillating sinusoidally with time in the interval^[5]:

Where Zs is the reference phase origin of the segmental wave in this segment, and W(z) is a spatial smoothing function (soft window). The soft window is defined by the difference between smooth steps at both ends:

The image characteristic of the segmentation function is a column of standing waves with phase changing uniformly with time.

Fig 9. Spatiotemporal diagram of duodenal segmental motion and schematic diagram of the change in the z-component velocity field caused by segmental motion in the 2D symmetric plane.

2.6 Dilute Substance Transport

Model Introduction
The dilute substance transport model is the core module for quantifying the convection-diffusion behavior of the bacterial solution (using mol instead of CFU as the measurement unit, with consistent numerical calculation logic) in the intestine. Its core goal is to couple fluid flow (laminar flow) with the dynamic deformation of the intestine (moving mesh) to simulate the process of the bacterial solution from initial release, propulsion with intestinal contents, to radial diffusion. The model is based on the "dilute substance assumption" (the concentration of the product bacterial solution is extremely low, which does not change the fluid density and viscosity), only considering molecular diffusion and convective transport, and realizing the coordinated calculation with laminar flow and moving mesh through multi-physics coupling. Finally, it outputs evaluation indicators such as the spatiotemporal distribution of product concentration, centroid trajectory, and centroid velocity.

Model Assumptions and Core Objectives
(1) Model Assumptions
a. Dilute Substance Characteristics: The initial concentration of the bacterial solution is 1 mol/m³, which is much lower than the fluid saturation concentration. Its presence does not affect the density (1040 kg/m³) and dynamic viscosity (0.0014 Pa·s) of the intestinal contents, meeting the physical simplification conditions of "dilute substances".
b. Transport Mechanism: The transport of the bacterial solution in the intestine is dominated only by molecular diffusion and convection, and turbulent diffusion is ignored (since the Reynolds number of intestinal flow is Re ≈ 3, which belongs to the laminar flow category).
c. Initial Distribution: The bacterial solution is initially released in the form of an "elliptical bolus" at the jejunal axial position zc = 0.06125 m and radial position rc = 0 (near the axis). The semi-axes of the bolus in the axial (z) and radial (r) directions are az = 0.02 m and ar = 0.00692 m, respectively. The smooth function flc2hs is used to avoid numerical oscillations caused by hard edges (smoothing half-width sm = 0.00346 m).
d. Physics Field Coupling: The convective velocity field of the bacterial solution directly references the solution results of the laminar flow module (u_fluid, w_fluid), and the influence of mesh deformation on the concentration field is corrected by the displacement field (d(r, t), d(z, t)) of the moving mesh module to ensure the accuracy of concentration calculation in the dynamic intestinal environment.

(2) Core Objectives
a. Quantify the spatiotemporal concentration distribution of the bacterial solution under the synergy of small intestinal peristalsis, segmental motion, and gastric intermittent emptying mechanism, and evaluate the influence of intestinal motion on the mixing and propulsion of the product bacterial solution.
b. Calculate the axial centroid trajectory and axial dispersion coefficient of the bacterial solution, and verify the reproducibility of the model for the physiological function of "peristaltic propulsion - segmental mixing".
c. Provide the bacterial solution concentration boundary condition at the wall for the wall adsorption-desorption kinetics (boundary ODE module) to realize the full-process coupling of "transport - adsorption".

Governing Equations and Transport Mechanisms

The transport of the bacterial solution in the intestine follows the convection-diffusion equation. Considering the influence of moving mesh deformation on the concentration field (introducing the time derivative term of mesh displacement), its general form is as follows:

Where, Ji is the diffusion flux, c is the concentration of the bacterial solution, D is the molecular diffusion coefficient of the bacterial solution in the intestinal contents (m²/s), u is the velocity vector, and Rc is the reaction term. There is no chemical reaction in the main domain, and the adsorption-desorption reaction at the wall will be set separately through the boundary reaction, see Section 2.7 Boundary Ordinary Differential and Algebraic Equations.

Mesh Deformation Correction Term u

Due to the dynamic deformation of the mesh caused by the peristalsis and segmental motion of the intestinal wall, when COMSOL handles substance transport in the moving mesh, it will automatically introduce a mesh velocity term into the concentration equation to correct the calculation of the time derivative. The corrected concentration time change rate is:

ct is the intrinsic time derivative of concentration, cr and cz are the spatial gradients of concentration in the radial and axial directions, respectively. (∂r/∂t) and (∂z/∂t) are the radial and axial displacement rates of the mesh output by the moving mesh module. This correction ensures that the mesh deformation does not cause "spurious diffusion" in the concentration calculation.

Parameter Definition

See Section 2.1.1

Initial Concentration Distribution

It is assumed that the product passes through the stomach and duodenum under the protection of the outer coating and is released exactly at the entrance of the jejunum. The initial concentration of the bacterial solution is defined by an analytical function, adopting the form of "elliptical constraint + smooth step" to avoid numerical instability caused by sudden changes in the initial concentration. The expression at t = 0 s is as follows:

flc2hs is a built-in smooth Heaviside function in COMSOL. When the value in the parentheses is ≥ 0 (within the elliptical range), the function value approaches 1, and the concentration of the bacterial solution is C0 (C0 = 1 [mol/m³]); when the value is < 0 (outside the elliptical range), the function value smoothly transitions to 0, ensuring that there is no hard boundary in the initial concentration field.

Fig 10. Spatial distribution of product concentration in the 2D symmetric plane at t = 0, 17, 30 s.

Boundary Condition Settings

According to the physiological function of the intestine and the coupling requirements of the model physics field, the boundary conditions of the dilute substance transport module are divided into 3 categories, covering all geometric boundaries.

(1) No-Flux Boundary

A no-flux boundary is set on the solid walls without bacterial solution adsorption/release, such as the gastric wall, duodenal wall (non-adsorption region), and pyloric wall. The diffusion flux is 0, that is, the concentration gradient perpendicular to the wall is 0.

This means that the bacterial solution cannot diffuse through these walls and only moves with the fluid along the direction parallel to the wall or diffuses radially.

(2) Open Boundary

Open boundary conditions are set at the inlet and outlet of the overall digestive tract model, i.e., the end of the jejunum and both sides of the gastric antrum opening. The external concentration is set to 0.

To avoid reflux errors caused by sudden concentration changes at the boundary, the "upwind" numerical scheme is coupled, and the concentration value in the inflow direction is preferentially used for calculation to ensure flux continuity.

(3) Surface Reaction Boundary

A surface reaction boundary is set on the jejunal wall to simulate the intestinal mucosa region where the bacterial solution can be adsorbed. The flux of the bacterial solution is determined by the "adsorption-desorption kinetics". The dilute substance transport module only provides the bacterial solution concentration cw at the wall, and the flux calculation is dominated by the boundary ordinary differential and algebraic equation module.

The net flux of the bacterial solution at the wall is equal to the difference between the adsorption flux （Jads） and the desorption flux （Jdes）. The specific kinetic equation is detailed in Section 2.7 (Boundary ODE Module).

Multi-Physics Coupling

The dilute substance transport module needs to be directly coupled with the laminar flow and moving mesh modules, and the coupling with the boundary ordinary differential and algebraic equations will describe the bacterial solution adsorption-desorption process (see Section 2.7). The coupling logic is realized through the "Reactive Flow, Dilute Species" in the "Multiphysics" of COMSOL. The specific methods are as follows:

(1) Coupling with Laminar Flow

The "Fluid Flow (spf)" interface is adopted, and the velocity field (u_fluid, w_fluid) solved by the laminar flow module is directly substituted into the convection term u*(c∇) of the convection-diffusion equation.

(2) Coupling with Moving Mesh

d(z, t) of the moving mesh module is substituted into the concentration time derivative correction term:

Numerical Calculation
The "linear Lagrangian element" is used to match the velocity discretization of the laminar flow module to ensure the stability of the convection term calculation. Consistent with the overall transient settings, an adaptive time step is adopted, with a maximum step size of 0.1 s and a minimum step size of 1×10⁻⁴ s. The convergence criterion is "concentration absolute tolerance ≤ 0.001". The PARDISO direct solver is used to handle the sparse matrix of the concentration field to ensure the solution accuracy in the high-gradient region near the boundary layer.

Indicator Settings
The transport characteristics of the bacterial solution are analyzed from three dimensions: concentration spatiotemporal distribution, centroid trajectory, and axial dispersion coefficient to verify the physiological rationality of the model. See Section 3. Results and Analysis for the calculation result analysis.

2.7 Boundary Ordinary Differential and Algebraic Equations

Overview of the ODE/DAE Module
The boundary ODE/DAE module is the core module for quantifying the interaction between the bacterial solution and the intestinal wall. Its core function is to describe the dynamic equilibrium process of "adsorption-desorption" of the bacterial solution at the wall adsorption sites after reaching the intestinal wall in the dilute substance transport module, based on the Langmuir adsorption model and shear-enhanced desorption mechanism. This module only acts on the preset adsorption boundary (jejunal wall, see Section 2.6.5.1 Surface Reaction Boundary). Through coupling with the dilute substance transport module and laminar flow module, it realizes the quantitative transfer of "bacterial solution concentration → adsorption flux → wall adhesion amount", and finally outputs key indicators such as total wall adhesion amount, coverage area ratio, and net flux, providing a direct basis for evaluating the retention and distribution effect of the product bacterial solution in the intestine and a theoretical basis for the study of drug dosage and action sites.
Variables, Core Objectives, and Key Assumptions
(1) Variable Settings

Name	Expression	Description
Gamma	-	Wall adsorption amount
tauw_alt	spf.mu*spf.sr	Viscosity × shear rate (Pa)
koff	koff0(1 + alphaabs(tauw_alt))	Shear-enhanced desorption
theta	min(1,max(0,Gamma/Gmax))	Adsorption site occupancy rate
Jads	konc(1 - theta)	Adsorption rate
Jdes	koff*Gamma	Desorption rate
M_Gamma	intop3( 2pir*Gamma )	Total wall adhesion amount
CoverAreaFrac	intop3( 2pirflc2hs(Gamma/Gmax - 0.2, 1e-3) ) / intop3( 2pi*r )	Coverage area ratio: the proportion of the area where Γ reaches the threshold (the threshold can be changed, e.g., 20%)
z_bar	intop3( 2pirzGamma ) / intop3( 2pir*Gamma )	Coverage center position (where it is most concentrated)
J_net	intop3( 2pir*(Jads - Jdes) )	Net flux (current intensity of deposition - desorption)
M0	intopD( 2pir*an3(r,z) )[mol/m^3]	Initial total dosage (mol)
AttachYield	M_Gamma / M0	Adhesion yield (how much of the current dose remains)

(2) Core Objectives
Establish a kinetic model for the adsorption-desorption of the bacterial solution on the intestinal wall, and quantify the adsorption rate (Jads), desorption rate (Jdes), and net flux (Jnet); reveal the enhancing effect of intestinal wall shear stress on the desorption process, and clarify the influence of the shear sensitivity coefficient (α) on the adhesion stability; calculate the spatiotemporal distribution of the wall adsorption amount (Γ) in the "wall domain", and evaluate the coverage range and retention efficiency of the bacterial solution in the intestine.

(3) Key Assumptions
a. Monolayer Adsorption: The adsorption sites on the intestinal wall are limited and uniformly distributed. The adsorption of the bacterial solution follows the Langmuir model, with a maximum adsorption capacity of Gmax (no multi-layer adsorption).
b. Shear-Enhanced Desorption: The wall shear stress (τw) generated by the flow of intestinal contents will break the binding between the bacterial solution and the wall, leading to an increase in the desorption rate with the increase of shear stress. The quantitative relationship is related through the shear sensitivity coefficient α.
c. Non-Renewable Adsorption Sites: Within the model simulation duration (600 s), it is assumed that the adsorption sites on the intestinal wall do not regenerate or desorb, and only the cycle of "unoccupied sites → adsorption → desorption → empty sites" is considered.
d. Coupling Dependence: The adsorption rate depends on the bacterial solution concentration at the wall (cw) provided by the dilute substance transport module (see Section 2.6.5.3), the desorption rate depends on the wall shear rate (sr) provided by the laminar flow module, and mesh deformation does not affect the distribution of adsorption sites (the moving mesh module only corrects the geometric position and does not change the wall adsorption characteristics).

Governing Equations and Kinetic Mechanisms

(1) Core Equation of Adsorption-Desorption Kinetics
The adsorption-desorption process of the bacterial solution on the intestinal wall satisfies "dynamic equilibrium". The time change rate of the wall adsorption amount Γ (Gamma) is equal to the net flux (Jnet), which is the core governing equation of the boundary ODE:

Where, Jads is the adsorption flux, describing the migration rate of the bacterial solution from the fluid phase to the wall phase; Jdes is the desorption flux, describing the detachment rate of the bacterial solution from the wall phase to the fluid phase.

(2) Adsorption Flux: Langmuir Adsorption Model
The adsorption process is controlled by both the "bacterial solution concentration at the wall" and the "proportion of unoccupied adsorption sites".

Where, kon is the adsorption rate constant (m/s), reflecting the binding ability between the bacterial solution and the wall sites; cw is the concentration of the bacterial solution at the wall, provided in real-time by the dilute substance transport module, i.e., the calculation result of the bacterial solution concentration c at the jejunal wall; θ is the adsorption site occupancy rate; Gmax is the maximum adsorption capacity.

(3) Desorption Flux: Shear-Enhanced Mechanism
The desorption process is divided into two parts: "basal desorption" and "shear-enhanced desorption". Mechanical shear will enhance desorption (shear force promotes the detachment of the bacterial solution from the intestinal wall).

Where, koff0 is the effective desorption rate, which changes with shear stress; koff0 is the basal desorption rate, i.e., the desorption ability without shear; is the shear sensitivity coefficient, quantifying the enhancing effect of shear on desorption; τw is the wall shear stress (Pa), which is approximately calculated in the laminar flow module in the model.

Where, μ is the fluid dynamic viscosity, and sr is the shear rate.

Initial Conditions
At the initial moment of the model (t = 0 s), there is no bacterial solution adsorption on the intestinal wall, and the initial values of the adsorption amount and adsorption rate are set to 0, which is consistent with the physiological initial state of "the bacterial solution has not reached the wall".

Boundary Selection and Multi-Physics Coupling Mechanism

(1) Applicable Boundary
Only the jejunal wall is selected as the boundary. The jejunum is the core region for intestinal nutrient absorption and bacterial colonization. Selecting this segment of the wall to simulate adsorption is consistent with the product's target site requirements.
(2) Coupling Logic
It needs to be directly coupled with the dilute substance transport and laminar flow modules. The coupled data is realized through the "variable transfer" and "multi-physics interface" of COMSOL. The specific logic is as follows:

Coupled Module	Transferred Data	Transfer Direction	Physical Meaning	Implementation Method
Dilute Substance Transport (tds)	Wall bacterial solution concentration cw	Dilute Substance Transport → ODE/DAE	Provide the concentration driving term for the adsorption flux Jads	Reference the concentration calculation result c at the jejunal wall boundary of the dilute substance transport module
Dilute Substance Transport (tds)	Net flux Jnet	ODE/DAE → Dilute Substance Transport	Provide the wall flux boundary condition for the dilute substance transport	Substitute Jnet into the jejunal surface boundary of the dilute substance transport
Laminar Flow (spf)	Wall shear rate sr	Laminar Flow → ODE/DAE	Provide the shear driving term for the effective desorption rate koff	Reference the shear rate variable sr of the laminar flow module, and calculateτw in combination

(3) Verification of Coupling Equations
After coupling the dilute substance transport module, the complete format of the bacterial solution flux balance equation at the jejunal wall boundary is:

The left side is the wall diffusion flux of the dilute substance transport module (positive when pointing to the wall), and the right side is the net adsorption flux of the ODE/DAE module. The equality of the two ensures the mass conservation of the "fluid phase - wall phase", verifying the rationality of the coupling logic. n*(-D∇c)represents the diffusion of the bacterial solution through the jejunal wall, i.e., entering the "wall domain" from the "intestinal domain" or vice versa.

Numerical Calculation Settings
(1) Discretization: The shape function adopts the "discontinuous Lagrangian" quadratic element.
(2) Linear Solver: The PARDISO direct solver of the overall model is used to handle the sparse matrix equation of the adsorption amount Γ, avoiding the convergence risk of iterative solution.

Indicator Settings

Based on the transient simulation results of COMSOL Multiphysics 6.3 (600 s, output time step of 0.1 s, solution time: 13188 s), combined with the physiological function of the intestine, the analysis is carried out from three core dimensions: the hydrodynamic characteristics of the model, the transport behavior of dilute substances, and the wall adsorption kinetics. The influence of the synergistic effect of peristalsis and segmentation on product migration, mixing, and wall retention is quantified, the reproducibility of the model for physiological processes is verified, and data support is provided for the prediction of product action sites and dosage optimization.

3. Results and Analysis

3.1 Hydrodynamic Characteristics of the Model

Definition of Global Flow Rate and Monitoring Position
To test the ability of the model to reflect the basic characteristics of fluid transport in the digestive system, global flow rate variables such as inflow (jejunal inflow), outflow (jejunal outflow), infloww (gastric inflow into the pylorus), and outfloww (pyloric inflow into the duodenum) are defined in the intestinal model. These variables are the core indicators for quantifying fluid retention, propulsion efficiency, and inter-regional mass exchange in the intestine. These variables are defined through the "Global Equations" and "Integration Coupling" functions of COMSOL, directly related to the fluid flow (laminar flow module) and geometric boundaries (such as the jejunal inlet/end, pyloric interface). Their dynamic changes can verify the mass conservation of the model, reflect the regulatory effect of peristalsis and segmentation on the flow rate, and provide data support for the "fluid-driven mechanism" of product transport.

Global Variable Name	Expression	Monitoring Boundary	Physical Meaning
inflow (Jejunal Inflow)	intop1(2pir*w_fluid)	Jejunal Inlet	Instantaneous volume flow rate of fluid entering the jejunum from the duodenum/pylorus
outflow (Jejunal Outflow)	intop2(2pir*w_fluid)	Jejunal Terminal Outlet	Instantaneous volume flow rate of fluid flowing out of the model from the jejunum
infloww (Gastric Inflow into Pylorus)	intop5(2pir*w_fluid)	Stomach-Pylorus Interface	Instantaneous volume flow rate of fluid entering the pylorus from the gastric antrum
outfloww (Pyloric Inflow into Duodenum)	intop4(2pir*w_fluid)	Pylorus-Duodenum Interface	Instantaneous volume flow rate of fluid entering the duodenum from the pylorus
netflow (Jejunal Net Flow)	netflowt - (outflow + inflow)/2	Global Integration (Jejunal Domain)	Cumulative net retention of fluid in the jejunum (inflow - outflow)
netfloww (Pyloric Net Flow)	netflowwt - (outfloww + infloww)/2	Global Integration (Pyloric Domain)	Cumulative net retention of fluid in the pylorus (gastric inflow - duodenal outflow)

Flow Direction: w_fluid is the axial (z-direction) velocity component. A positive value indicates the positive direction along the z-axis (from stomach → pylorus → duodenum → jejunum). Therefore, a positive value of inflow/infloww represents fluid "entering" the target region, and a positive value of outflow/outfloww represents "flowing out".

Dynamic Change Law of Flow Rate
(1) Flow Rate Change in the Duodenal Domain

Fig 11. Cumulative Flow Rate in the Duodenum

Further, the mass conservation of the model and the physiological rationality of the flow pattern are verified through the cumulative inflow and outflow curves at the jejunal site. The cumulative volume curve of the jejunum shows a typical "step + fine line" shape: the stepwise jump synchronized every 20 s reveals the intermittent emptying driven by the pyloric gating; the 6 s fine lines inside the steps result from the high-frequency modulation of the instantaneous flux by segmental motion. Reflux only occurs at the moment of valve switching with a limited amplitude, indicating that the valve control has a significant inhibitory effect on reflux. The slow fluctuation of the step height is consistent with the amplitude time course of the gastric antral peristaltic wave train (lifespan ≈ 61 s). The three curves satisfy the net volume relationship and rise approximately in parallel, indicating that the mass conservation is consistent with the boundary flux evaluation.
(2) Flow Rate Change in the Pyloric Domain

Fig 12. Cumulative Flow Rate at the Pylorus

The cumulative efflux (blue), cumulative reflux (green), and cumulative net emptying (red) at the pylorus show a clear stepwise rise within 0–600 s. The step rhythm is completely in phase with the 20 s cycle "soft valve" gating in the model: the gating is applied to the pyloric segment with a length of L1.5 through the boundary displacement function, and the axial weight of sin2(ℼZ/L1.5) is used to limit the action range. Thus, the cumulative curve is nearly horizontal during the "contraction holding" period of the valve and triggers a stepwise jump during the "relaxation holding" period, reproducing the classic rhythm of intermittent gastric emptying. Fine undulations of approximately 6 s can be seen inside each step, which is consistent with the set segmentation frequency of the model (f=10 min-1), indicating that the duodenal/jejunal segmental motion causes high-frequency modulation of the instantaneous flux and near-wall shear rather than decisive propulsion. It can be obtained that at 600 s, the cumulative efflux volume is 2.3*10-4 m3, the cumulative reflux is 0.65*10-4 m3, and the cumulative net emptying is 1.45*10-4 m3. The reflux only accounts for 30% of the efflux, indicating that the gating and geometric contraction have a significant inhibitory effect on reflux; the estimated average net emptying rate is 2.4*10-7 m3s-1 approximately . The three curves maintain the relationship of "red ≈ blue − green" for a long time and rise approximately in parallel, indicating that the boundary integral operator used has good mass conservation and numerical consistency; at the same time, the slow envelope of the jump amplitude of each step fluctuates with time, which is consistent with the gastric antral peristaltic wave train lifespan (≈ 61 s) and the time course modulation of the amplitude envelope Aof, reflecting the synergistic effect of gastric antral peristalsis, pyloric gating, and segmental motion on multiple time scales.

Dynamic Change Law of Velocity Field

Fig 13. Streamline diagram of the duodenal region near the pylorus at t = 415.2 s, colored by velocity magnitude.

At t = 415.2 s (pyloric relaxation window), the local velocity field shows a typical "central jet + bilateral reflux vortex" structure: the jet is ejected from the narrowest part of the pylorus and forms a long and narrow high-velocity zone at the duodenal inlet, then decays rapidly along the radial direction; a pair of near-wall counter-rotating vortex cells are induced by the shear layer on both sides of the jet, and the streamlines are closed in the vortex cells, indicating that this region is mainly for mixing and recirculation and contributes little to net propulsion. The large shear gradient at the lip of the jet means that the wall shear stress reaches a relatively high value, which is consistent with the aforementioned division of labor of "segmentation enhances mixing on a short scale, and gating triggers jet transport during the window period", and explains the source of stepwise emptying and high-frequency fine lines in the cumulative curve from the perspective of the instantaneous field.

Fig 14. Streamline diagram of fluid flow in the planar region of the stomach at t = 421.5 s (peristaltic wave approaching the pylorus), colored by velocity magnitude.

The flow field predicted by this model is highly consistent with the classic description of gastric motion when the contents are Newtonian fluids (0.0014 Pa·s, 1040 kg/m³). In particular, the flow field predicted in the pyloric antrum region shows two main flow patterns. When the gastric antral contraction wave moves toward the pylorus, the gastric contents are forced to reflux. When the gastric occlusion degree reaches 56%, a retrograde jet-like motion is formed. To quantify the development process of this jet motion, the maximum retrograde flow velocity was tracked within a 20-second flow cycle. As the gastric antral contraction wave approaches the pyloric sphincter, the increase in contraction wave velocity and occlusion degree causes the retrograde jet to accelerate continuously, reaching a maximum retrograde flow velocity of 8.8 cm/s at the narrowest part of the pyloric canal. Another main flow pattern identified in the pyloric antrum region is the annular vortex formed between two adjacent gastric antral contraction waves (Figure 14). Through strong mixing, friction, and grinding of the gastric contents, retrograde jet motion and vortex structures have long been considered the main drivers of gastric digestion of food^[1].

3.2 Transport Behavior of Dilute Substances

Spatiotemporal Distribution of Concentration

Fig 15. Shows the spatial distribution of product concentration in the 2D symmetric plane at t = 0, 150, 300, 450, 600 s.

Over time, under the combined action of peristaltic propulsion and segmental mixing, the concentration field gradually expands along the axial direction and diffuses radially. It shows monotonic dilution under the combined action of convection dominance, diffusion, and segmental mixing. At the same time, the attenuation of the bulk concentration is also continuously absorbed by the wall adsorption-desorption flux.

Centroid Position

Fig 16. Product Centroid Position

This figure shows the time evolution of the axial centroid of the bolus. The curve shows a typical "step + fine line" characteristic: the large-scale steps are strictly in phase with the 20 s pyloric gating, indicating that dzc/dt increases significantly during the relaxation window and almost stagnates during the contraction holding period; the high-frequency fine teeth inside the steps are synchronized with the set segmentation frequency, reflecting the periodic modulation of segmental motion on near-wall mixing and instantaneous convection rather than decisive propulsion. The overall monotonic rise and no continuous retraction indicate that the net propulsion is dominated by gating, and reflux only occurs at the moment of opening and closing switching (consistent with the aforementioned "cumulative volume step"); the slight envelope fluctuation of the curve on a longer time scale is consistent with the modulation of the gastric antral peristaltic wave train lifespan and amplitude envelope Aof. In summary, the behavior of the centroid is highly consistent with the rhythm of volume efflux: gating determines the migration "jump distance", segmentation accelerates spreading and mixing, and adsorption/desorption only changes the bulk depletion and spatial distribution, but does not change the forward direction of the centroid.

Axial Dispersion Coefficient

Fig 17. Axial Dispersion Coefficient

The axial dispersion coefficient shows "a slowly increasing envelope + obvious periodic oscillation": within 0–600 s, its peak-to-peak amplitude and positive half-cycle average value continue to increase, indicating that the bolus advances into a region with stronger shear/stronger stirring (the central jet generated by pyloric windowing and downstream segmentation jointly enhance axial spreading); the large-scale oscillation in groups of 20 s is in phase with the pyloric gating: positive peaks appear in the relaxation window, and it is close to zero or negative valleys appear in the contraction holding period. The high-frequency fine lines (6 s) superimposed in the steps come from the modulation of segmental motion on instantaneous mixing. The negative value in the figure is not "physically negative diffusion", but "the instantaneous axial spreading rate is negative, which is consistent with the aforementioned "step + fine line" rhythm of centroid stepwise forward movement and cumulative volume.

3.3 Wall Adsorption Kinetics

Wall adsorption is the core link for the product bacterial solution to achieve retention and action in the intestine. Based on the Langmuir adsorption model and shear-enhanced desorption mechanism, key indicators such as total adhesion amount, coverage range, and net flux are used to quantify the dynamic behavior of the bacterial solution at the jejunal adsorption boundary, reveal the coupling law of "fluid shear - wall concentration - adsorption equilibrium", and verify the reproducibility of the model for the intestinal adsorption process. See Section 2.7.2.1 for evaluation indicators and calculations.

Total Wall Adhesion Amount

Fig 18. Wall Adhesion Amount

Within 0–600 s, the total wall adhesion amount MΓ increases monotonically with superimposed 20 s serrated oscillations; by 600 s, it has reached approximately 5.6×10−7mol, indicating that under the current dosage and parameter set, net adsorption is dominant for a long time, and the bulk phase is continuously "extracted" to the wall without obvious global saturation (Г/Gmax has not approached 1). The serrations are in phase with the pyloric gating: the relaxation window triggers a jet → the near-wall concentration c in the bulk phase increases sharply and the supply is sufficient, but at the same time, the shear increases to make koff larger; the shear decreases during the contraction holding period, the recirculation is enhanced, koff decreases, and Jads continues to work in the medium shear zone. The result of the alternation of the two phases is "scouring - redeposition" in each 20 s cycle, but the net increase is positive, resulting in the "sawtooth-like rise" shown in the figure. On a longer time scale, the slope rises slowly, which is consistent with the synergistic enhancement of near-wall supply and mixing by the gastric antral ACW wave train (61 s) + duodenal/jejunal segmentation: the stepwise forward movement of the bolus centroid periodically sends the high-concentration zone to the new downstream wall, and the product of the available area × near-wall concentration × contact time expands with time, so MΓ shows a nearly linear - weakly superlinear growth.

Fig 19. Coverage Area Ratio

The coverage area ratio is "the proportion of the intestinal wall area that reaches the effective coverage threshold. Its formula is:

Where Ajej is the total area of the jejunal wall, Γ is the wall adhesion amount, Gmax is the maximum wall capacity, andθthr is the set "effective coverage threshold" (usually 0.2, i.e., coverage ≥ 20% is regarded as "covered", that is, the adhesion amount reaching 20% of the maximum capacity is regarded as covered). Therefore, this indicator emphasizes the geometric coverage range rather than "how much quantity", which is suitable for answering where the drug/bacterial solution is distributed and how large the coverage area is, and is complementary to the total adhesion amount MГ ("how much quantity"). Figure 17 shows that the coverage area ratio increases monotonically within 0–600 s and presents a "step + fine line" shape: by 600 s, it has reached approximately 0.93, indicating that under the current administration and kinetic parameters, most areas of the jejunal wall have reached effective coverage. The steps of the curve are in phase with the 20 s pyloric gating: the jet pulse during the relaxation period quickly pushes the near-wall high-concentration zone downstream, triggering a large number of new walls to cross the threshold, and the coverage ratio jumps; the propulsion weakens during the contraction holding period, and the coverage ratio rises slowly/platforms. From the perspective of spatial distribution, the curve indicates that the model successfully reproduces the entire process of "pulsatile transport (gating determines the jump distance) - local mixing (segmentation determines spreading and near-wall supply) - wall capture".

Net Flux

Fig 20. Net Flux

The figure shows the global net flux Jnet, which can be understood as the instantaneous net rate of "adsorption - desorption" on the entire segment of the intestinal wall; it is the time derivative of the total wall adhesion amount MΓ, so Jnet >0 indicates that the wall is increasing net. The curve in the figure shows three phases and is superimposed with periodic textures in phase with physiological driving forces:

(1) 0-60 s: Jnet jumps rapidly from zero, which comes from the initial bolus + the near-wall high-concentration jet supply generated by the first pyloric relaxation.

(2) 60-350 s: The 20 s sawtooth is in phase with the pyloric gating, and the 6 s fine teeth come from segmental mixing. The gating "batches" the high-concentration zone to the downstream, and the segmental motion enhances the near-wall supply. However, with the increase of the covered area and the continuous extraction of the bulk phase, the growth of Jads=konc(1-θ) is partially offset, thus entering a quasi-steady state limited by transport.

(3) 350-600 s: The bulk concentration further decreases, more walls approach the capacity limit, and the proportion of the desorption term koff=koff0(1+α|ﬢw|) in the high-shear window of the gating increases relatively. The combination of these factors leads to a decline in the net adsorption rate.

Adhesion Yield

The total initial administered dose (sum of the bulk phase and initial wall value) is denoted as M0. The adhesion yield is defined as the normalized proportion of the cumulative retention on the wall, which measures "the proportion of the initial dose that has been captured by the wall by time t".

Fig 21. Adhesion Yield

At t=600 s, AttachYield=0.14, indicating that approximately 14% of the initial dose has been transferred to the wall.

Fig 22. 3D Product Concentration Distribution(t=0-30s)

Fig 23. 3D Product concentration Distribution(t=30-60s)

Fig 24. 3D Product concentration Distribution(t=540-570s)

Fig 25. 3D Product concentration Distribution(t=570-600s)

4. Appendix A

Physiologically, the small intestine and gastric antrum do not follow a straight path but curve significantly along with the body. Here, we will discuss how the curvature of the intestinal tract affects the current results^[4].
Dean was the first to study fully developed flow in curved pipes. The Dean number (De) is a crucial dimensionless number that characterizes the influence of curvature. To evaluate the impact of intestinal segment curvature on the model's transport results, the Dean number for a curved pipe is defined by the following equations:

where d=2a is the pipe diameter, Rc is the radius of curvature, and Um is the cross-sectional average velocity.
Taking a physiologically reasonable range of curvature radius Rc=(4−12)a, the Dean number satisfies:

Referring to Berger–Talbot's correction formula Q_c / Q_s = 1 - 2.306 × 10^-8 De^4,even in the case of the maximum curvature (Rc=4a, De=6.6), the flow rate correction under constant pressure drop is only approximately 0.0044%. This effect attenuates rapidly as Rc increases. It can be concluded that under the conditions of low-to-moderate Reynolds numbers and weak Dean flow in this study, simplifying the intestinal lumen to an axisymmetric straight pipe does not cause distinguishable deviations in the overall flux and first-order transport conclusions. Secondary flows induced by curvature may still have a slight impact on near-wall mixing, but their magnitude is far smaller than the mixing and shear modulation caused by gated jets and segmentation in this model (both of which dominate all characteristics of the "step + fine line" pattern). Therefore, it is reasonable to ignore curvature in the baseline model of this paper.

5. Appendix B

In the process of CFD (Computational Fluid Dynamics) calculations, the setting of the initial state is highly relevant to the convergence of the computation. To ensure stable computation, the following adjustments have been made to the initial geometric settings of the model:

1.The initial relaxed radius of the pyloric region is set to R0 in the design, but in the actual initial setup, it is set to 0.5R0. Thus, during the first pyloric cycle, the pyloric radius will contract from R0 to 0.1R0 and then return to 0.5R0 at the start of the next cycle.

2.The initial radius of the duodenum is set to R0, and it gradually returns to the designed radius R2 after the computation begins. This adjustment is made to ensure numerical convergence at the start of the calculation.

[1]Ferrua, M.J. and Singh, R.P. (2010), Modeling the Fluid Dynamics in a Human Stomach to Gain Insight of Food Digestion. Journal of Food Science, 75: R151-R162. https://doi.org/10.1111/j.1750-3841.2010.01748.x

[2]Trusov, P. V., Zaitseva, N. V., Kamaltdinov, M. R., A Multiphase Flow in the Antroduodenal Portion of the Gastrointestinal Tract: A Mathematical Model, Computational and Mathematical Methods in Medicine, 2016, 5164029, 18 pages, 2016. https://doi.org/10.1155/2016/5164029

[3]J.S. Karthikeyan, Deepti Salvi, Mukund V. Karwe, Modeling of fluid flow, carbohydrate digestion, and glucose absorption in human small intestine,Journal of Food Engineering,Volume 292,2021,110339,ISSN 0260-8774,https://doi.org/10.1016/j.jfoodeng.2020.110339.

[4]T. Ishikawa, T. Sato, G. Mohit, Y. Imai, T. Yamaguchi, Transport phenomena of microbial flora in the small intestine with peristalsis,Journal of Theoretical Biology,Volume 279, Issue 1,2011,Pages 63-73,ISSN 0022-5193,https://doi.org/10.1016/j.jtbi.2011.03.026.

[5]Jinping Zha, Siyu Zou, Jianyu Hao, Xinjuan Liu, Guillaume Delaplace, Romain Jeantet, Didier Dupont, Peng Wu, Xiao Dong Chen, Jie Xiao, The role of circular folds in mixing intensification in the small intestine: A numerical study,Chemical Engineering Science,Volume 229,2021,116079,ISSN 0009-2509,https://doi.org/10.1016/j.ces.2020.116079.

[6]Palmada, N.; Hosseini, S.; Avci, R.; Cater, J.E.; Suresh, V.; Cheng, L.K. A Systematic Review of Computational Fluid Dynamics Models in the Stomach and Small Intestine. Appl. Sci. 2023, 13, 6092. https://doi.org/10.3390/app13106092

Part2

Acquisition and Verification of the Yeast Metabolic Network Model

Definition of Metabolite Identification Tools

Definition of Engineered Proteins

Energy expenditure model

Incorporating Burden Reactions

Simulation of Different Scenarios

Module-wise Analysis

Flux Balance Analysis (FBA) for Each Scenario

Results and Analysis

Module-wise Analysis

Reference

In many in vitro systems, when cells are subjected to specific perturbations—such as the expression of heterologous genes—one of the most direct physiological manifestations is the inhibition of cellular proliferation. Therefore, in synthetic biology engineering, a marked reduction in host cell growth rate upon the introduction of a genetic component or circuit is often regarded as an indicator of cytotoxicity or metabolic burden.

In this study, because it was technically challenging to construct a long-term, stable engineered microbial system under laboratory conditions, we employed Flux Balance Analysis (FBA) to simulate the growth rate variations of Saccharomyces cerevisiae after the integration of exogenous genes. This modeling approach provided a quantitative framework to evaluate circuit-associated metabolic stress and to guide the rational design of genetic constructs.

1. Acquisition and Verification of the Yeast Metabolic Network Model

(1) Software Installation and Setup

The COBRApy package was obtained from its official GitHub repository (https://github.com/opencobra/cobrapy), where both the source code and documentation are publicly available. Following the installation instructions, COBRApy was installed using the command pip install cobra, and a dedicated Python virtual environment was configured to manage dependencies and ensure reproducibility of the analysis

(2) Model Acquisition

The genome-scale metabolic model of Saccharomyces cerevisiae (strain S288C) was retrieved and downloaded from the BiGG Models database. This model provides a comprehensive representation of yeast metabolism, encompassing thousands of reactions, metabolites, and associated genes that collectively describe the organism’s metabolic capacity.

(3) Model Verification and Preprocessing

Basic quality checks were performed on the imported model, including verification of the total number of reactions (R), metabolites (M), and genes (G), as well as the assessment of mass balance consistency. The summarized statistics are shown in the figure below, confirming that the complete yeast metabolic network was successfully imported and structurally valid for downstream simulations.

Where:

R → reactions; M → metabolites; G → genes

2. Definition of Metabolite Identification Tools

Within the model, target metabolites (e.g., ATP, ADP) were located through keyword-based searches. Among multiple compartment-specific forms, cytosolic metabolites (compartment = c) with higher connectivity were preferentially selected. This prioritization ensures that the chosen metabolites represent the most functionally relevant intracellular species, thereby improving the biological realism and reliability of subsequent simulations.

3. Definition of Engineered Proteins

In this study, several heterologous proteins—normally not synthesized in yeast—were introduced into the system. As these target proteins were absent in the native metabolic network, we incorporated all proteins expressed by the six synthetic modules into the metabolic model to simulate their perturbative effects.

Protein synthesis and secretion are both energy-consuming processes. To precisely estimate the energetic cost, each protein was characterized by two attributes: its amino acid length and whether it is secreted extracellularly. Detailed information was obtained from the UniProt

4. Energy expenditure model

According to the literature, the synthesis of each amino acid residue requires the expenditure of 2 ATP and 2 GTP molecules^[1].
If a protein is secreted, an additional cost of 2 ATP per 100 amino acids is incurred.

The estimation process for the energy required for protein secretion is as follows.
(1) Based on previous studies, we first collected approximate energy consumption data for each stage of the secretion process.

(2) Detailed Analysis of Energy Expenditure
a. SRP recognition (co-translational targeting):
Mechanism: Both the signal recognition particle (SRP) and its receptor (SR) are GTPases; their association and dissociation depend on GTP hydrolysis. Consumption: Approximately 1–2 GTP per protein, depending on whether the SRP pathway is utilized.
b. ER translocation (BiP/Kar2p-driven): Mechanism: The Hsp70-family chaperone BiP pulls nascent polypeptides into the ER lumen through ATP-dependent binding and release cycles. Consumption: Around 2–4 ATP (lower bound) to 5–10 ATP (upper bound) per 100 amino acids, depending on the number of binding cycles required.
c. Folding and quality control (BiP, PDI, calnexin systems): Mechanism: Protein folding in the ER involves iterative chaperone-assisted cycles and glycosylation modifications, with each chaperone interaction consuming one ATP molecule.
Consumption: Approximately 5–10 ATP (conservative) to 10–30 ATP (aggressive) per 100 amino acids, depending on folding efficiency and misfolding correction frequency.
d. Vesicle loading and trafficking (COPII/COPI, Rab GTPases): Mechanism: Small GTPases such as Sar1, Arf, and Rab regulate vesicle formation, cargo transport, and fusion. Consumption: Varies widely with cargo density—typically about 3–5 GTP per protein; for low-abundance or large cargo proteins, the cost can reach 10–20+ GTP.
e. Vesicle fusion and exocytosis: Mechanism: Fusion between secretory vesicles and the plasma membrane relies on Rab and Rho GTPase cycles and the assembly/disassembly of SNARE complexes. GTP hydrolysis drives vesicle targeting, docking, and SNARE disassembly after fusion. Consumption: Each fusion event typically involves ~1–3 GTP hydrolyses, depending on the number of small GTPases engaged and the complexity of SNARE recycling.

(3) Sensitivity Analysis
Sensitivity analysis is a method used to evaluate how variations in model parameters influence system outputs, effectively measuring how sensitive the model outcome y is to small changes in each input parameter xi.

Mathematically, this relationship is expressed as a partial derivative of y with respect to xi.

In this study, we performed sensitivity analysis on the ATP cost associated with protein secretion, testing how variations in key energetic parameters (notably the median ATP cost per amino acid, 0.19) within a ±20% range affected the results. In the resulting figure, blue and orange bars represent the lower and upper energy estimates, respectively; the central black dots denote baseline values, while red error bars indicate uncertainty-induced variability.

Results and Interpretation
a. Protein length shows a pronounced influence on energy sensitivity. For example, long proteins such as AGA1 exhibits substantially higher ATP consumption under baseline conditions compared to short proteins (e.g., TFF3, AGA2). When energetic parameters fluctuate, their absolute errors are also larger, indicating that long proteins are the most sensitive to assumptions regarding ATP cost.
b. Short proteins exhibit lower energy demand and higher robustness. Proteins such as TFF3 and HFB1 display narrow error ranges, meaning that even with ±20% parameter variation, their relative ATP consumption rankings remain largely unchanged.
c. Overall trend remains robust. Despite parameter uncertainty, the relative order of ATP consumption among different proteins remains consistent. This suggests that the model reliably identifies which protein types impose the greatest energetic burden on the host.
In summary, sensitivity analysis not only validates the robustness of our conclusions but also highlights parameters that warrant more precise experimental determination (e.g., ATP/aa consumption rates for long proteins). This analysis is of great value to our circuit design, as it quantitatively evaluates the metabolic burden imposed by different recombinant proteins and guides future optimization strategies.

5. Incorporating Burden Reactions

(1) Introducing a virtual metabolite (“token”) A token serves as a placeholder for an exogenous protein. In synthesis reactions, ATP/GTP hydrolysis must result in the formation of a token product; only once a token is produced can it be subsequently consumed by SEC or TURN reactions. This design ensures formal mass flow between synthesis and secretion steps within the metabolic network.
(2) Synthesis reaction (SYN_P) Let protein P consist of n amino acid residues. As described previously, the translation of each amino acid requires 2 ATP and 2 GTP, producing ADP, GDP, inorganic phosphate (Pi), and protons (H⁺). The synthesis reaction is defined as follows:

where Prot_P_c represents the virtual metabolite (token) corresponding to one complete protein molecule.
(3) Secretion reaction (SEC_P) To model the energy cost of secretion, each secreted protein is assumed to require a fixed amount of additional ATP. In this reaction, the token is consumed, ensuring mass balance and allowing FBA to properly capture energetic trade-offs. a. For secreted proteins:

b. For Cytoplasmic (Non-secreted) Proteins:

For non-secreted proteins (TURN_P), the reaction consumes the token without additional energy cost. This simplification assumes that no energy or metabolites are recycled during degradation or turnover.
(4) Expression Intensity Constraint In realistic biological systems, engineered strains must continuously synthesize or secrete a certain amount of heterologous proteins per unit biomass and time. To capture this, we introduced expression intensity as a flux constraint in the FBA framework. This constraint prevents the optimization algorithm from artificially suppressing heterologous expression in pursuit of maximal growth rate—thereby ensuring that the engineered circuit remains functionally active within the simulation.

6. Simulation of Different Scenarios

6.1 Module-wise Analysis

Given that distinct promoters may activate at different physiological conditions and that engineered modules often function sequentially, we designed simulations that mirror possible experimental subgroups. Six configurations were analyzed to assess metabolic burden:

(1) Adsorption module (2) Degradation module (3) Therapeutic module (4) Adsorption + Degradation + Therapeutic modules (5) Adsorption + Therapeutic modules (6) Degradation + Therapeutic modules

6.2 Flux Balance Analysis (FBA) for Each Scenario

Flux Balance Analysis (FBA) is a standard computational framework in metabolic modeling. The general linear constraint equations are as follows:

(1) Steady-State Assumption The intracellular metabolite concentrations are assumed to remain constant under steady-state conditions, meaning that the production rate of each metabolite equals its consumption rate. Mathematically, this can be expressed as:

where S is the stoichiometric matrix of the metabolic network (rows represent metabolites, columns represent reactions), and v is the reaction flux vector.

(2) Flux Constraints Each reaction flux is bounded by upper and lower limits that define its reversibility and feasible rate range:

where li and ui represent the lower and upper bounds of reaction i, respectively. These constraints incorporate both thermodynamic directionality and experimentally determined limits on transport or enzyme capacity.

(3) Objective Function FBA typically assumes that cells have evolved to optimize certain physiological objectives, such as maximizing growth rate or biomass formation. The objective function is therefore defined as:

where c is a coefficient vector selecting the target flux .Solving this linear programming problem yields an optimal flux distribution v* and the corresponding optimal growth rate.

7. Results and Analysis

To evaluate the potential metabolic burden introduced by the engineered gene circuits, we assessed the host growth rate under varying protein expression levels.

The baseline growth rate of the unmodified yeast model was 0.085844 h⁻¹.

Low expression levels (1e⁻⁵–1e⁻⁴): Across all functional modules (adsorption, degradation, and therapy), the host maintained growth rates nearly identical to baseline, with only minor reductions (typically within 0.1–5%).
This suggests strong metabolic compatibility under normal expression conditions.

Moderate to high expression (up to 5e⁻⁴): More complex combinations (e.g., Adsorption + Therapy + Degradation) exhibited noticeable decreases in growth rate, while individual modules remained relatively stable, maintaining ≥90% of the baseline growth rate.

Overall trend: Growth reduction was observed mainly under extreme expression scenarios, whereas moderate expression levels imposed limited metabolic burden.
These results indicate that the designed genetic circuits exhibit good metabolic robustness and practical implementability, maintaining near-baseline growth performance across most simulated conditions.

scenario	1e-04	1e-05	5e-04	5e-05
Adsorption	0.083637	0.085623	0.074807	0.084740
Adsorption + Therapy	0.082397	0.085499	0.068606	0.084120
Adsorption + Therapy + Degradation	0.080996	0.085359	0.061603	0.083420
Degradation	0.084443	0.085704	0.078841	0.085144
Degradation + Therapy	0.083203	0.085580	0.072640	0.084524
Therapy	0.084604	0.085720	0.079644	0.085224

7.1 Visualization

To present our results more intuitively, we visualized the simulation data as follows:

(1) Growth rate vs. expression level X-axis: Protein expression intensity (log scale) Y-axis: Growth rate (h⁻¹) Dashed line: Wild-type control

(2) ATP burden vs. expression level X-axis: Protein expression intensity (log scale) Y-axis: ATP consumption (mmol/gDW/h) A higher value indicates a greater energetic burden.

(3) GTP burden vs. expression level Same as the ATP burden plot, but representing GTP consumption.

Code ▼


									from copy import deepcopy
									import os
									import numpy as np
									import matplotlib.pyplot as plt
									import pandas as pd
									from cobra import io, Model, Reaction, Metabolite
									
									# ========= 0) Global style: prettier colors / fonts =========
									plt.style.use("tableau-colorblind10")
									plt.rcParams.update({
									"figure.dpi": 110,
									"axes.grid": True,
									"grid.alpha": 0.25,
									"axes.spines.top": False,
									"axes.spines.right": False,
									"axes.titleweight": "bold",
									"legend.frameon": True,
									})
									
									# -------- User config --------
									sbml_path = r"C:\Users\WANG\Desktop\yeast-GEM.xml"
									output_dir = r"C:\Users\WANG\Desktop"
									
									# ========= 1) Load model =========
									base_model = io.read_sbml_model(sbml_path)
									print(f"Model loaded: {base_model.name} | R={len(base_model.reactions)} "
									f"M={len(base_model.metabolites)} G={len(base_model.genes)}")
									
									# ========= 2) Robust metabolite search =========
									def scan_mets_for_id(model, query_keywords, prefer_compartment='c'):
									keys = [k.lower() for k in query_keywords]
									hits = []
									for m in model.metabolites:
									text = (m.id + " " + (m.name or "")).lower()
									if any(k in text for k in keys):
									deg = len(m.reactions)
									hits.append((m, deg))
									def rank(tup):
									m, deg = tup
									comp = (m.compartment or "").lower()
									c_ok = (comp == prefer_compartment) or m.id.endswith("_c") or "[c]" in m.id
									return (0 if c_ok else 1, -deg, m.id)
									hits.sort(key=rank)
									if hits:
									return hits[0][0].id
									return None
									
									TARGETS = {
									"ATP": ["atp", "adenosine triphosphate"],
									"ADP": ["adp", "adenosine diphosphate"],
									"Pi": ["pi", "phosphate", "orthophosphate", "inorganic phosphate"],
									"H2O": ["h2o", "water"],
									"H": ["h+", "proton"],
									"GTP": ["gtp", "guanosine triphosphate"],
									"GDP": ["gdp", "guanosine diphosphate"],
									"CU2": ["cu2", "copper(2", "cu(2", "cupric"],
									"ZN2": ["zn2", "zinc(2", "zn(2"],
									"HEME": ["heme", "protoheme", "proto-"]
									}
									
									chem_ids = {k: scan_mets_for_id(base_model, kw, prefer_compartment='c')
									for k, kw in TARGETS.items()}
									
									missing = [k for k in ("ATP","ADP","GTP","GDP","Pi","H2O","H") if chem_ids.get(k) is None]
									if missing:
									raise RuntimeError(f"Missing core metabolites in mapping: {missing}. "
									"Please verify metabolite ids in your model.")
									
									# ========= 3) Engineered protein dictionary =========
									engineered = {
									"AGA1": {"aa": 725, "secreted": True},
									"AGA2": {"aa": 73, "secreted": True},
									"ECM14": {"aa": 575, "secreted": False},
									"HFBI": {"aa": 96, "secreted": True},
									"PETase": {"aa": 290, "secreted": True},
									"MHETase": {"aa": 600, "secreted": True},
									"SOD1": {"aa": 154, "secreted": True, "Cu":1, "Zn":1},
									"TFF3": {"aa": 59, "secreted": True},
									"Catalase": {"aa": 514, "secreted": True, "Heme":1},
									}
									ENERGY_PER_AA = {"ATP": 1, "GTP": 2}
									
									# ========= 4) Add burden reactions =========
									def add_burden_reactions(model, protein_names, force_level=0.0, use_cofactors=True):
									"""
									Return (modified_model, created_reaction_ids).
									force_level is the minimal flux for secretion/turnover reactions.
									"""
									m = deepcopy(model)
									created = []
									
									def get_met(m_obj, key):
									mid = chem_ids.get(key)
									if not mid:
									return None
									try:
									return m_obj.metabolites.get_by_id(mid)
									except KeyError:
									return None
									
									for pname in protein_names:
									if pname not in engineered:
									raise KeyError(f"Protein {pname} not found in engineered dictionary.")
									meta = engineered[pname]
									aa = int(meta["aa"])
									atp_need = aa * ENERGY_PER_AA["ATP"]
									gtp_need = aa * ENERGY_PER_AA["GTP"]
									
									token_id = f"prot_{pname}_c"
									if token_id in m.metabolites:
									token = m.metabolites.get_by_id(token_id)
									else:
									token = Metabolite(token_id, name=f"{pname}_token", compartment="c")
									m.add_metabolites([token])
									
									atp = get_met(m, "ATP"); adp = get_met(m, "ADP")
									gtp = get_met(m, "GTP"); gdp = get_met(m, "GDP")
									h2o = get_met(m, "H2O"); pi = get_met(m, "Pi"); h = get_met(m, "H")
									
									# Translation
									syn = Reaction(f"SYN_{pname}")
									sto = {}
									if atp: sto[atp] = -atp_need
									if adp: sto[adp] = +atp_need
									if gtp: sto[gtp] = -gtp_need
									if gdp: sto[gdp] = +gtp_need
									if h2o: sto[h2o] = -(atp_need + gtp_need)
									if pi: sto[pi] = (atp_need + gtp_need)
									if h: sto[h] = (atp_need + gtp_need)
									sto[token] = 1.0
									
									if use_cofactors:
									if pname == "SOD1":
									cu = get_met(m, "CU2"); zn = get_met(m, "ZN2")
									if cu: sto[cu] = sto.get(cu, 0) - 1.0
									if zn: sto[zn] = sto.get(zn, 0) - 1.0
									if pname == "Catalase":
									heme = get_met(m, "HEME")
									if heme: sto[heme] = sto.get(heme, 0) - 1.0
									
									syn.add_metabolites(sto)
									syn.lower_bound = 0.0
									syn.upper_bound = 1000.0
									m.add_reactions([syn]); created.append(syn.id)
									
									# Secretion or turnover
									if meta["secreted"]:
									secretory_atp = max(1, int(round(0.39 * aa)))
									sec = Reaction(f"SEC_{pname}")
									sec_sto = {}
									if atp: sec_sto[atp] = -secretory_atp
									if adp: sec_sto[adp] = secretory_atp
									if h2o: sec_sto[h2o] = -secretory_atp
									if pi: sec_sto[pi] = secretory_atp
									if h: sec_sto[h] = secretory_atp
									sec_sto[token] = -1.0
									sec.add_metabolites(sec_sto)
									sec.lower_bound = float(force_level)
									sec.upper_bound = 1000.0
									m.add_reactions([sec]); created.append(sec.id)
									else:
									turn = Reaction(f"TURN_{pname}")
									turn.add_metabolites({token: -1.0})
									turn.lower_bound = float(force_level)
									turn.upper_bound = 1000.0
									m.add_reactions([turn]); created.append(turn.id)
									
									return m, created
									
									def summarize_solution(model, sol, created_ids):
									out = {"status": sol.status, "growth_rate": float("nan"),
									"protein_ATP": 0.0, "protein_GTP": 0.0, "protein_ATP_equiv": 0.0}
									if sol.status != "optimal":
									return out
									out["growth_rate"] = float(sol.objective_value)
									
									atp_id = chem_ids.get("ATP")
									gtp_id = chem_ids.get("GTP")
									total_atp, total_gtp = 0.0, 0.0
									
									for rid in created_ids:
									if rid not in sol.fluxes.index:
									continue
									v = float(sol.fluxes[rid])
									if v == 0:
									continue
									rxn = model.reactions.get_by_id(rid)
									for met, coeff in rxn.metabolites.items():
									# ... existing code ...

# 计算ATP和GTP消耗
if atp_id and met.id == atp_id and coeff < 0:
    total_atp += (-coeff) * v
if gtp_id and met.id == gtp_id and coeff < 0:
    total_gtp += (-coeff) * v

out["protein_ATP"] = total_atp
out["protein_GTP"] = total_gtp
out["protein_ATP_equiv"] = total_atp + total_gtp
return out

# =========5) Six scenarios=========
ADSORPTION = ["AGA1", "AGA2", "ECM14", "HFBI"]
DEGRADATION = ["PETase", "MHETase"]
THERAPY = ["SOD1", "TFF3", "Catalase"]

SCENARIOS = {
    "Adsorption": ADSORPTION,
    "Degradation": DEGRADATION,
    "Therapy": THERAPY,
    "Adsorption + Therapy + Degradation": ADSORPTION + THERAPY + DEGRADATION,
    "Adsorption + Therapy": ADSORPTION + THERAPY,
    "Degradation + Therapy": DEGRADATION + THERAPY,
}

# =========6) Expression levels & run FBA=========
expr_levels = [1e-5, 5e-5, 1e-4, 5e-4]
expr_levels_sorted = sorted(expr_levels)

growth_rates_expr = {name: [] for name in SCENARIOS}
protein_atp_expr = {name: [] for name in SCENARIOS}
protein_gtp_expr = {name: [] for name in SCENARIOS}

baseline_solution = base_model.optimize()
baseline_growth_rate = float(baseline_solution.objective_value)
print(f"Baseline growth rate: {baseline_growth_rate:.6f}")

for expr_level in expr_levels_sorted:
    print(f"\n--- expr_level={expr_level:.0e} ---")
    for scenario_name, proteins in SCENARIOS.items():
        print(f" scenario: {scenario_name} ...", end="", flush=True)
        m, created = add_burden_reactions(base_model, proteins,
                                        force_level=expr_level, use_cofactors=True)
        sol = m.optimize()
        summ = summarize_solution(m, sol, created)
        
        growth_rates_expr[scenario_name].append(summ["growth_rate"])
        protein_atp_expr[scenario_name].append(summ["protein_ATP"])
        protein_gtp_expr[scenario_name].append(summ["protein_GTP"])
        
        print(f" growth={summ['growth_rate']:.6f} (status={summ['status']})")

# =========7) Pretty bar plots=========
def plot_grouped_bars(metric_dict, title, ylabel, baseline=None):
    """Plot grouped bar charts for different scenarios and expression levels"""
    scenarios = list(metric_dict.keys())
    S, L = len(scenarios), len(expr_levels_sorted)
    x = np.arange(L)  # group centers = expression levels
    width = min(0.8 / S, 0.13)
    
    fig, ax = plt.subplots(figsize=(13, 6))
    
    for i, sc in enumerate(scenarios):
        vals = metric_dict[sc]
        offset = (i - (S-1)/2) * width
        ax.bar(x + offset, vals, width=width, label=sc,
               color=plt.get_cmap("tab20c")(i / len(scenarios)),
               edgecolor="white", linewidth=1)
    
    ax.set_xticks(x)
    ax.set_xticklabels([f"{v:.0e}" for v in expr_levels_sorted])
    ax.set_xlabel("Minimum protein expression level (mmol gDW$^{-1}$ h$^{-1}$)")
    ax.set_ylabel(ylabel)
    ax.set_title(title)
    
    if baseline is not None:
        ax.axhline(y=baseline, color="black", linestyle="--", linewidth=1.5,
                   label="Wild-type Baseline")
    
    ax.legend(loc="best", ncol=3, fontsize=9)
    fig.tight_layout()
    plt.show()

# Growth rate
plot_grouped_bars(growth_rates_expr,
                  "Growth rate across scenarios and expression levels",
                  "Growth rate (h$^{-1}$)",
                  baseline=baseline_growth_rate)

# ATP demand
plot_grouped_bars(protein_atp_expr,
                  "ATP burden across scenarios and expression levels",
                  "ATP demand (mmol gDW$^{-1}$ h$^{-1}$)")

# GTP demand
plot_grouped_bars(protein_gtp_expr,
                  "GTP burden across scenarios and expression levels",
                  "GTP demand (mmol gDW$^{-1}$ h$^{-1}$)")

# =========敏感性分析=========
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 0) Define the middle value and percentage variation for sensitivity analysis
middle_value = 0.19  # Middle value used in the model
variation = 0.2  # 20% relative variation; change to 0.3 for 30%

# Calculate relative bounds
lower_bound = middle_value * (1 - variation)
upper_bound = middle_value * (1 + variation)

# 1) Function to calculate ATP consumption for a given protein size (aa)
def calculate_atp_consumption(aa_length, atp_per_100aa):
    return aa_length * atp_per_100aa / 100  # ATP per aa, converted to per protein size

# 2) Protein sizes (in amino acids) for testing
protein_sizes = {
    "AGA1": 725,
    "AGA2": 73,
    "ECM14": 575,
    "HFBI": 96,
    "PETase": 290,
    "MHETase": 600,
    "SOD1": 154,
    "TFF3": 59,
    "Catalase": 514,
}

# 3) Sensitivity analysis for each protein
results = []
for pname, aa in protein_sizes.items():
    # Calculate ATP consumption with lower, middle, and upper bounds
    atp_lower = calculate_atp_consumption(aa, lower_bound)
    atp_middle = calculate_atp_consumption(aa, middle_value)
    atp_upper = calculate_atp_consumption(aa, upper_bound)
    
    # Calculate the error based on the middle value
    error_lower = abs(atp_lower - atp_middle)
    error_upper = abs(atp_upper - atp_middle)
    
    # Store the results
    results.append({
        "Protein": pname,
        "ATP (lower)": atp_lower,
        "ATP (middle)": atp_middle,
        "ATP (upper)": atp_upper,
        "Error (lower)": error_lower,
        "Error (upper)": error_upper,
        "Avg Error": (error_lower + error_upper) / 2
    })

# 4) Create a DataFrame to visualize the results
df_results = pd.DataFrame(results)

# 5) Print the results table
print(df_results)

# 6) Plot the ATP consumption and errors
fig, ax = plt.subplots(figsize=(10, 6))

# Plotting ATP consumption for each protein
ax.bar(df_results["Protein"], df_results["ATP (lower)"], 
       label="Lower ATP", alpha=0.7)
ax.bar(df_results["Protein"], df_results["ATP (upper)"], 
       label="Upper ATP", alpha=0.7, bottom=df_results["ATP (lower)"])
ax.plot(df_results["Protein"], df_results["ATP (middle)"], 'ko',
        label="Middle ATP (0.19)", markersize=6)

# Plotting error bars
ax.errorbar(df_results["Protein"], df_results["ATP (middle)"],
            yerr=df_results["Error (upper)"], fmt='o', color='red',
            label="Error range", capsize=5)

ax.set_ylabel('ATP Consumption (per protein)')
ax.set_xlabel('Proteins')
ax.set_title('ATP Consumption Sensitivity Analysis')
ax.legend()

plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

Reference

1. Hu, X.P., Dourado, H., Schubert, P. et al. The protein translation machinery is expressed for maximal efficiency in Escherichia coli. Nature Communications 11, 5260 (2020)

Part3

Model Description

Basic Assumptions

Model Equations

Variables and Parameter Definitions

Simulation Process and Results

Reference

Intracellular Signal Transduction Simulation

1. Model Description

In our degradation module, we engineered Saccharomyces cerevisiae to express an α-factor receptor (a G protein–coupled receptor) on the cell membrane. The signaling pathway consists of three modules.

First, when the engineered strain reaches a certain population density in the intestinal environment, the α-factor binds to the Ste2 receptor. The resulting complex activates the G protein, leading to the dissociation of its β–γ subunits (Ste4–Ste18) from Gpa1.

Next, the scaffold protein Ste5 couples the upstream Ste4–Ste18 complex to the downstream MAPK cascade components, thereby activating the three-tiered MAPK signaling pathway.

Finally, the promoters controlling α-factor and enzyme expression are activated. This results in the expression of PET- and MHET-degrading enzymes, while simultaneously promoting α-factor synthesis and secretion into the extracellular environment. The increasing extracellular α-factor concentration further reinforces the signaling cascade, forming a positive feedback loop that amplifies enzyme expression and drives efficient microplastic degradation.

2. Basic Assumptions

(1) Since the G protein can be resynthesized after dissociation, we assume that its total intracellular concentration remains constant.

(2) The scaffold protein Ste5 can be reused; therefore, its intracellular concentration is also assumed to remain constant.

(3) Fus3 activates the transcription factor Ste12 by phosphorylating and inhibiting its repressors (Dig1 and Dig2). For model simplification, we assume that Fus3 directly exerts activating effects on Ste12.

3. Model Equations

3.1 Variables and Parameter Definitions

A daily dietary supplement designed for people at high risk of microplastic exposure and health-conscious individuals (which can later be registered as a special medical purpose formula food).

Variables	Biological meanings
Ste2	Concentration of the receptor Ste2
α	Concentration of α-factor
Ste4-Ste18	Concentration of the βγ subunits dissociated from the G protein
α-Ste2	Concentration of the α-factor–Ste2 receptor complex
α-Ste2-G	Concentration of the complex formed by α–Ste2 and the G protein
Ste5-Ste4-Ste18	Concentration of the Ste5–Ste4–Ste18 complex
Ste5-Ste4-Ste18-Ste11	Concentration of the Ste5–Ste4–Ste18–Ste11 complex
Ste5-Ste4-Ste18-Ste11-Ste7	Concentration of the Ste5–Ste4–Ste18–Ste11–Ste7 complex
Fus3	Concentration of Fus3 (MAPK)
Ste12	Concentration of the transcription factor Ste12
αmRNA	Concentration of mRNA transcribed from the α-factor gene
enzyme	Concentration of the translated degradation enzyme

Parameters	Biological meaning
k₁	Dissociation rate constant of α–Ste2
k₂	Association rate constant between α-factor and Ste2
k₃	Dissociation rate constant of α–Ste2–G
k₄	Association rate constant between G protein and α–Ste2
k₅	Association rate constant between Ste5 and Ste4–Ste18
k₆	Dissociation rate constant of Ste5–Ste4–Ste18
k₇	Association rate constant between Ste5–Ste4–Ste18 and Ste11
k₈	Dissociation rate constant of Ste5–Ste4–Ste18–Ste11
k₉	Association rate constant between Ste5–Ste4–Ste18–Ste11 and Ste7
k₁₀	Dissociation rate constant of Ste5–Ste4–Ste18–Ste11–Ste7
k₁₁	Activation rate of Fus3
k₁₂	Activation rate of Ste12
k₁₃	Transcription rate of the α-factor gene
k₁₄	Degradation rate of α-mRNA
k₁₅	Translation rate of the α-factor gene
k₁₆	Degradation rate of α-factor
k₁₇	Transcription–translation rate of the degradation enzyme
k₁₈	Feedback regulation coefficient of Ste12
k₁₉	Inhibition coefficient of Fus3 under excessive α-factor

3.2 Simulation Process and Results

Based on literature data and known biological properties, the corresponding parameter values were determined and integrated into the simulation model The model primarily simulates the following four processes:

(1) Receptor activation stage
Chemical reaction:

ODE：

Simulation Results：
The concentrations of both α-factor and the α–Ste2 complex increased significantly, indicating that our system can not only secrete α-factor to participate in quorum sensing, but also effectively enhance the expression of downstream enzyme genes.

(2) MAPK Cascade Signal Amplification
Chemical reaction:

ODE：

Simulation Results：
The concentrations of intermediate signaling proteins in the MAPK cascade first increased and then decreased, consistent with the typical transient activation pattern observed in signaling intermediates. In particular, the elevated concentration of Fus3 suggests its promoting effect on the transcription of downstream target genes.

(3)α-Factor Secretion and Positive Feedback Formation
Chemical reaction:

ODE：

Simulation Results：
Simulation results showed an overall increase in the concentrations of the upstream transcription factor, α-factor mRNA, and α-factor itself. The rate of α-factor accumulation first increased and then slowed down. This is because, in the early stage, the positive feedback of α-factor synthesis and secretion accelerates its concentration growth. However, as the extracellular α-factor concentration becomes high, excessive Fus3 exerts a negative regulatory effect on Ste12 concentration^[3].

To represent this effect, a negative regulatory term was introduced into Equations (9) and (11) of the ODE system, reflecting Fus3-mediated inhibition of Ste12 and α-factor expression. This adjustment prevented an unrealistic continuous increase of α-factor concentration and made the simulation results more consistent with biological observations.

Code ▼


									function positive_cycle_heatmap_pretty
									clc; clear;
									
									% ---------------- Switches ----------------
									use_log_view = true; % log10 heatmap (recommended)
									use_foldchange = false; % show fold-change A/A0
									use_hill_fb = false; % Hill-saturated feedback
									
									% ---------------- Parameters ----------------
									k13 = 0.15; % transcription (basal, scaled by N)
									k14 = 0.05; % mRNA degradation
									k15 = 0.10; % translation (scaled by N)
									k16 = 0.02; % alpha degradation
									kfb = 0.05; % positive feedback strength
									
									% Hill feedback (used only if use_hill_fb = true)
									n_hill = 2; % Hill coefficient
									K_hill = 0.5; % half-saturation
									
									% Density & time
									N_values = linspace(0.1, 2.0, 40);
									tspan = [0 220];
									nT = 120;
									time_sample = linspace(tspan(1), tspan(2), nT);
									
									% Initial condition
									x0 = [0.2, 0.1]; % [A, mRNA]
									
									% ---------------- Storage ----------------
									A_matrix = zeros(length(N_values), nT);
									
									% ---------------- Sweep densities ----------------
									for i = 1:length(N_values)
									N = N_values(i);
									if ~use_hill_fb
									% Linear positive feedback
									fun = @(t,x) [ ...
									k15 * x(2) * N - k16 * x(1); ...
									k13 * N + kfb * x(1) - k14 * x(2) ...
									];
									else
									% Hill-saturated positive feedback
									fun = @(t,x) [ ...
									k15 * x(2) * N - k16 * x(1); ...
									k13 * N + kfb * ( x(1).^n_hill ./ (K_hill^n_hill + x(1).^n_hill) ) - k14 * x(2) ...
									];
									end
									
									[t, x] = ode45(fun, tspan, x0);
									
									A = x(:,1);
									if use_foldchange
									A = A / max(1e-12, x(1,1));
									end
									if use_log_view
									A = log10(A + 1e-9);
									end
									
									A_matrix(i,:) = interp1(t, A, time_sample, 'linear', 'extrap');
									end
									
									% ---------------- Plot heatmap ----------------
									figure('Position',[120 120 780 540]);
									imagesc(time_sample, N_values, A_matrix);
									set(gca,'YDir','normal');
									xlabel('Time','Interpreter','latex');
									ylabel('Cell density $N$','Interpreter','latex');
									
									% Title (avoid underscores by using math mode)
									if use_hill_fb
									base_ttl = 'Alpha heatmap with Hill-saturated positive feedback';
									else
									base_ttl = 'Alpha heatmap with linear positive feedback';
									end
									if use_log_view && use_foldchange
									ttl = [base_ttl, ' ($\log_{10}$ fold-change)'];
									elseif use_log_view
									ttl = [base_ttl, ' ($\log_{10}$ scale)'];
									elseif use_foldchange
									ttl = [base_ttl, ' (fold-change)'];
									else
									ttl = base_ttl;
									end
									title(ttl,'Interpreter','latex');
									
									% Colorbar (set label via cb.Label to ensure LaTeX works)
									cb = colorbar;
									cb.Label.Interpreter = 'latex';
									if use_log_view && ~use_foldchange
									cb.Label.String = '$\log_{10}(A)$';
									elseif use_log_view && use_foldchange
									cb.Label.String = '$\log_{10}(A/A_0)$';
									elseif ~use_log_view && use_foldchange
									cb.Label.String = '$A/A_0$';
									else
									cb.Label.String = '$A$';
									end
									colormap(parula);
									
									% Contrast enhancement (percentile clipping)
									lo = prctile(A_matrix(:), 2);
									hi = prctile(A_matrix(:), 98);
									if isfinite(lo) && isfinite(hi) && lo < hi caxis([lo, hi]); end % Draw threshold line Nc for the linear model if
										~use_hill_fb Nc=(k16*k14)/(k15*kfb); % ~0.2 with defaults hold on; plot([time_sample(1) time_sample(end)], [Nc
										Nc], 'r--' , 'LineWidth' , 1.5); text(time_sample(end)*0.75, Nc+0.03, '$N_c = 0.20$' , ... 'Color'
										,'r','Interpreter','latex'); end grid on;						end

Reference

[1]Posas F, Takekawa M, Saito H. Signal transduction by MAP kinase cascades in budding yeast. Curr Opin Microbiol. 1998 Apr;1(2):175-82. doi: 10.1016/s1369-5274(98)80008-8
[2]Dohlman HG, Slessareva JE. Pheromone signaling pathways in yeast. Sci STKE. 2006 Dec 5;2006(364):cm6. doi:10.1126/stke.3642006cm6
[3]Esch RKWang Y, Errede B2006.Pheromone-Induced Degradation of Ste12 Contributes to Signal Attenuation and the Specificity of Developmental Fate. Eukaryot Cell5:.https://doi.org/10.1128/ec.00270-06

Part4

Acquisition of Metabolic Network Models for Engineered Yeast and Lactobacillus

Construction of the Metabolic Pool and Exchange Pathways

Simulation of Intestinal Nutrient Supply

Baseline Growth Rate

Exploration of Potential Coupled Exchanges Based on FVA

Linear Coupling Between ATP Maintenance Demand (ATPM) and Exchange Reactions

Comparison Between Baseline and Coculture Growth Rates

Reference

Feasibility Analysis of the Yeast–Lactobacillus Symbiotic System Based on a Perturbation-Constrained Model

At present, our engineered symbiotic system has not yet reached the stage for in vivo testing in humans, so it is essential to verify whether the system can function effectively under physiological conditions. We reasoned that if the growth rate of both species increases under co-culture conditions, it would indicate that the symbiotic interaction provides mutual metabolic support.

To test this hypothesis, we employed the COBRApy package in Python to simulate the metabolic behavior of the yeast–lactobacillus consortium. We further developed an innovative FVA-constrained model (Flux Variability Analysis–based constraint model) to evaluate the metabolic feasibility and energy balance of the system. The modeling results confirmed both the effectiveness and necessity of the symbiotic framework, offering valuable insights and a reproducible modeling strategy for future teams studying microbial co-culture and metabolic cooperation.

1. Acquisition of Metabolic Network Models for Engineered Yeast and Lactobacillus

To simulate the symbiotic behavior of Saccharomyces cerevisiae and Lactobacillus plantarum within the intestinal environment, we constructed two genome-scale metabolic models. The yeast model was obtained from our previously developed network that incorporated six engineered genetic circuits, while the Lactobacillus model (iNF517) was retrieved from the BiGG Models database.

Since the simulation procedure is identical for all six types of genetic circuits, we illustrate the workflow using the absorption circuit as a representative example.

2. Construction of the Metabolic Pool and Exchange Pathways

To enable metabolite exchange between the two species, we duplicated the original models using the deepcopy() function. For clarity and to avoid identifier conflicts, all metabolite and reaction IDs in the yeast and Lactobacillus models were renamed with the suffixes _y (for yeast) and _l (for lactobacillus), respectively.

Both sets of metabolites and reactions were then integrated into a new composite model,community = Model("community_Y_L"),representing the co-culture community system. This setup allowed the two microorganisms to interact metabolically with the simulated intestinal microenvironment.

According to literature reports, the main exchanged metabolites between the two species include galactose, amino acids, riboflavin, carbon dioxide, and the reactivation factor (rf), which are critical for sustaining mutual growth and metabolic complementarity.

To simulate interspecies nutrient exchange, we established bidirectional transport pathways between the two members (L ↔ com ↔ Y). Specifically, the metabolic products generated by Lactobacillus were transported into the shared metabolic pool (L_to_com), which then supplied these compounds to yeast (com_to_Y). Conversely, nutrients and amino acids produced by yeast were exported into the same pool (Y_to_com) and subsequently made available to Lactobacillus (com_to_L). This configuration effectively captured the bidirectional nutrient exchange relationships characteristic of a symbiotic system.

3. Simulation of Intestinal Nutrient Supply

Based on literature reports, the intestinal environment is characterized by relatively high concentrations of acetate[1. Belenguer, A., Duncan, S. H., Holtrop, G., Anderson, S. E., Lobley, G. E., & Flint, H. J. (2006). Effect of pH on lactate utilization by human colonic bacteria. Applied and Environmental Microbiology, 72(5), 3138–3144. Wang SP, Rubio LA, Duncan SH, Donachie GE, Holtrop G, Lo G, Farquharson FM, Wagner J, Parkhill J, Louis P, Walker AW, Flint HJ. 2020. Pivotal Roles for pH, Lactate, and Lactate-Utilizing Bacteria in the Stability of a Human Colonic Microbial Ecosystem. mSystems 5:10.1128/msystems.00645-20. ], moderate levels of lactate, and limited oxygen availability. Accordingly, we defined the initial conditions of the metabolic pool in our model to mimic these physiological features. The simulated environment contained a finite supply of nutrients—sufficient to sustain microbial metabolism but not unlimited—and maintained a microaerobic to anaerobic state.

This setup allowed us to realistically approximate the metabolic constraints of the gut lumen, ensuring that the co-culture model reflected biologically plausible nutrient and oxygen gradients.

4. Baseline Growth Rate

To determine whether the symbiotic system benefits the growth of the engineered strains, we first calculated the baseline growth rates of each species when cultured independently. These values served as reference points for subsequent comparison with the co-culture simulations, enabling quantitative assessment of the growth enhancement provided by the symbiotic interaction.

5. Exploration of Potential Coupled Exchanges Based on FVA

Flux Variability Analysis (FVA) is a widely used method in metabolic network analysis, designed to evaluate the variability of fluxes through each reaction in a network. It determines the possible range of flux values that each reaction can achieve under a given set of constraints while still satisfying the system’s steady-state and optimality conditions. The basic formulation of FVA can be expressed as follows:

(1) Maximization of the reaction flux vi

(2) Minimization of the reaction flux vi

The results of Flux Variability Analysis (FVA) are typically expressed as the maximum and minimum flux values for each reaction. These values define the permissible range within which a reaction’s flux can fluctuate without violating the stoichiometric or environmental constraints of the metabolic network. If both the maximum and minimum fluxes are equal to zero, it indicates that the corresponding reaction is inactive under the given conditions.

By applying FVA, we can identify which reactions are essential for achieving the model’s optimization objective and which have a flux range of zero, suggesting minimal or no contribution to the system’s performance. This approach also allows us to infer which exchange reactions are most likely to occur within the simulated intestinal environment.

In our analysis, the com_to_Y_lactate reaction exhibited the widest flux range (approximately 0.18–0.20 mmol/gDW/h), indicating that lactate transfer from the community pool to yeast serves as a major carbon flow pathway within the symbiotic system. Reactions such as com_to_L_amino, Y_to_com_amino, com_to_Y_galactose, and Y_to_com_riboflavin also displayed moderate flux variability (about 0.02–0.05 mmol/gDW/h), suggesting a strong potential for dynamic regulation of amino acid and galactose exchange between the two species.

In contrast, several other reactions showed extremely narrow or near-zero variability, implying that their cross-species exchange plays a limited role under optimal growth conditions and may be nonessential or suppressed. Therefore, in subsequent ATP maintenance (ATPM) calculations, these weak or inactive exchange reactions were excluded from the energy coupling analysis.

Reaction	Minimum	Maximum
L_to_com_galactose	0.000000	0.005158
com_to_Y_galactose	0.031539	0.055158
L_to_com_lactate	0.000000	0.000000
com_to_Y_lactate	0.154397	0.200000
L_to_com_acetate	0.000000	0.000605
com_to_Y_acetate	0.000000	0.000018
Y_to_com_amino	0.030920	0.066503
com_to_L_amino	0.030920	0.066503
Y_to_com_riboflavin	0.000000	0.005414
com_to_L_riboflavin	0.000000	0.000000
Y_to_com_co2	0.000000	0.000000
com_to_L_co2	0.000000	0.000000
Y_to_com_rf	0.000000	0.000000
com_to_L_rf	0.000000	0.000000

6. Linear Coupling Between ATP Maintenance Demand (ATPM) and Exchange Reactions

The symbiotic system promotes the exchange of multiple metabolites between the two microbial species. Through metabolic complementation and cooperative stress resistance, the co-culture enhances the yeast’s tolerance within the intestinal environment—particularly improving its resistance to bile salts and ethanol—while simultaneously reducing its inflammatory burden. Consequently, the overall energy expenditure and ATP maintenance demand of the yeast decrease, allowing it to maintain stable growth and functionality under high-stress conditions.

From this observation, we hypothesized that although the secretion of metabolites in exchange reactions might reduce the yeast’s intrinsic growth rate, it could simultaneously acquire protective compounds (e.g., anti-inflammatory or acid-neutralizing substances) from the partner species. This process would effectively reduce the ATP maintenance energy cost—a conceptual representation of the cell’s total steady-state energy requirement.

To quantitatively describe this biological relationship (“higher exchange activity → lower ATPM”), we developed an FVA-constrained model that introduces a novel linear coupling mechanism between metabolic exchange and maintenance energy.

Constraint-based metabolic modeling is a framework used to analyze how metabolic exchanges in microbial co-cultures affect the cellular energy maintenance demand (ATPM). The model is formulated on the basis of a stoichiometric matrix and reaction bounds, and solved using linear programming under the constraints of mass balance and environmental limitations. This approach does not rely on detailed kinetic parameters, making it suitable for large-scale gene networks and community-level metabolic interaction analyses.

In this study, we first performed baseline optimization of community growth, followed by FVA to identify exchange reactions activated under near-optimal conditions. We then encoded the biological assumption of “exchange activity linearly compressing the ATP maintenance lower bound” as a set of linear constraints, enabling the quantification of its effect on both growth rate and metabolic flux distribution within the same mathematical framework.

This strategy ensured computational feasibility and reproducibility, while also facilitating parameter sensitivity and scenario comparison analyses.

(1) Establishing the Baseline ATPM Lower Bound
We first defined a minimum maintenance energy requirement for the cell in the absence of exchange coupling, expressed as:

WherevATPM represents the ATP maintenance flux, which serves as the reference point for subsequent comparative and coupling constraints. This ensures that the model begins with a biologically reasonable baseline energy budget.

(2) Definition of Absolute Value Variables (t_vars)
To maintain solvability within the Linear Programming (LP) framework and to safely represent the non-negative magnitude of exchange intensity, we introduced auxiliary variables for each exchange reaction vi. For every exchange flux vi, a new variable ti was defined such that:

(3) Definition of the Total Exchange Intensity Variable (total_exchange)
To obtain a linear representation of the overall exchange activity and to provide a global driving variable for coupling the ATPM constraint, we aggregated all exchange fluxes into a single system-level quantity representing the total metabolic exchange intensity.
This global term reflects the overall energetic impact of metabolite exchange within the community model and serves as the basis for constructing the subsequent linear coupling between exchange activity and maintenance energy demand.

(4) Establishing a Linear Coupling between ATPM and Exchange Fluxes
To enable the lower bound of ATPM to decrease linearly with the magnitude of exchange fluxes, thereby quantitatively evaluating the coupling effect, we introduced a linear coupling between exchange reactions and ATP maintenance. The relationship is defined as:

where k represents the coupling coefficient, indicating the intensity of maintenance energy consumption associated with metabolite exchange. Since ATPM cannot decrease indefinitely with increasing exchange activity, a lower limit for ATPM was introduced to ensure that a physiologically reasonable level of maintenance energy is preserved for sustaining basic cellular functions. Based on flux variability analysis (FVA) of the metabolic network, the resulting FVA-constrained model was formulated as follows:

7. Comparison Between Baseline and Coculture Growth Rates

In the six scenarios derived from the metabolic toxicity analysis, we retrieved the corresponding perturbed metabolic networks and compared the growth performance of the original and coculture systems under simulated intestinal conditions (reactions were excluded from the energy coupling analysis.). The results showed that, in all scenarios, the coculture system achieved a higher host growth rate than the original strain, indicating that this strategy can effectively alleviate the metabolic burden caused by the engineered gene circuit.

Across different functional modules—including adsorption, degradation, therapy, and their combinations—we observed consistent improvements in host growth under coculture conditions. Specifically, the adsorption module alone imposed a relatively low metabolic burden, and thus exhibited the most pronounced growth enhancement in coculture. The degradation module introduced moderate metabolic stress in the original system, but this effect was partially mitigated in coculture, maintaining a growth rate close to the baseline. In contrast, the therapy module imposed the heaviest metabolic demand, significantly reducing growth in the monoculture; nevertheless, coculture conditions demonstrated a clear buffering effect.

Notably, under multi-functional combinations, the original system showed the most pronounced decline in growth rate, reflecting cumulative metabolic toxicity. However, even in these complex scenarios, the coculture system maintained a substantially higher growth rate.

Collectively, these results demonstrate that coculture not only reduces the metabolic cost associated with individual functional modules but also enhances metabolic robustness under combinatorial designs. This validates coculture as a feasible and effective strategy for mitigating synthetic circuit–induced metabolic toxicity, providing a new framework for future teams to alleviate the metabolic burden of engineered microbes.

Code ▼


									#community_optcom_coupled_ATPM_coupling_fixed.py
									from cobra import io, Model, Reaction, Metabolite
									from cobra.flux_analysis import flux_variability_analysis
									from copy import deepcopy
									import os
									import numpy as np
									import pandas as pd
									import warnings
									warnings.filterwarnings("ignore")
									
									# ========= User paths: set these to your local files =========
									YEAST_SBML_PATH = r"C:\Users\WANG\Desktop\Adsorption___Therapy_5e-05_perturbed_model.xml"
									LAC_SBML_PATH = r"C:\Users\WANG\Desktop\iNF517.xml"
									OUTPUT_COMMUNITY_SBML = r"C:\Users\WANG\Desktop\yeast_lacto_community_coupled_ATPM.xml"
									OUTPUT_SUMMARY_CSV = os.path.join(os.path.dirname(OUTPUT_COMMUNITY_SBML), "community_results_summary.csv")
									
									# === Safe restore function (unused here but kept) ===
									def restore_rxn_bounds(model, rxn_id, saved_bounds):
									if saved_bounds is None:
									return
									orig_lb, orig_ub = saved_bounds
									rxn = model.reactions.get_by_id(rxn_id)
									if orig_lb > orig_ub:
									orig_lb, orig_ub = orig_ub, orig_lb
									rxn.upper_bound = orig_ub
									rxn.lower_bound = orig_lb
									
									
									# ---------------- helper functions ----------------
									def safe_load_sbml(p):
									if not os.path.exists(p):
									raise FileNotFoundError(f"找不到文件：{p}")
									return io.read_sbml_model(p)
									
									def find_best_met(model, keywords, prefer_comp='c'):
									if isinstance(keywords, str):
									keys=[keywords.lower()]
									else:
									keys=[k.lower() for k in keywords]
									hits=[]
									for m in model.metabolites:
									text=(str(m.id)+" "+(m.name or "")).lower()
									if any(k in text for k in keys):
									hits.append((m, len(m.reactions)))
									if not hits:
									return None
									def score(t):
									m,deg = t
									comp = (m.compartment or "").lower()
									c_ok = (comp==prefer_comp) or str(m.id).endswith(f"_{prefer_comp}") or f"[{prefer_comp}]" in str(m.id)
									return (0 if c_ok else 1, -deg, str(m.id))
									hits.sort(key=score)
									return hits[0][0]
									
									def add_suffix_inplace(model, suffix):
									for met in list(model.metabolites):
									if not str(met.id).endswith(suffix):
									met.id = f"{met.id}{suffix}"
									for rxn in list(model.reactions):
									if not str(rxn.id).endswith(suffix):
									rxn.id = f"{rxn.id}{suffix}"
									model.id = f"{model.id}{suffix}"
									return model
									
									def find_biomass_id(model):
									for r in model.reactions:
									if "biomass" in r.id.lower() or "biomass" in (r.name or "").lower():
									return r.id
									return None
									
									def ensure_galactose_utilization(yeast_model):
									gal_e = None
									gal_c = None
									for m in yeast_model.metabolites:
									mid = m.id.lower()
									if "gal" in mid and m.compartment and m.compartment.lower() in ("e","extracellular","extracell","_e"):
									gal_e = m
									if "gal" in mid and (m.compartment or "").lower() in ("c","cytosol","_c"):
									gal_c = m
									if gal_e is None:
									gal_e = Metabolite("gal_e", name="D-Galactose (extracellular)", compartment="e")
									yeast_model.add_metabolites([gal_e])
									print("✅ 新建代谢物 gal_e")
									if gal_c is None:
									gal_c = Metabolite("gal_c", name="D-Galactose (cytosol)", compartment="c")
									yeast_model.add_metabolites([gal_c])
									print("✅ 新建代谢物 gal_c")
									ex_found = False
									for rxn in yeast_model.reactions:
									if rxn.id.lower().startswith("ex_") and gal_e in rxn.metabolites:
									ex_found = True; break
									if not ex_found:
									ex = Reaction("EX_gal_e"); ex.name = "Exchange reaction for D-Galactose"
									ex.lower_bound = -10.0; ex.upper_bound = 1000.0
									ex.add_metabolites({gal_e: -1}); yeast_model.add_reactions([ex])
									print("✅ 新建 exchange: EX_gal_e")
									if "GALt" not in yeast_model.reactions:
									tr = Reaction("GALt"); tr.name = "Galactose transport (extracellular -> cytosol)"
									tr.lower_bound = -1000.0; tr.upper_bound = 1000.0
									tr.add_metabolites({gal_e:-1, gal_c:1}); yeast_model.add_reactions([tr])
									print("✅ 新建 transport: GALt")
									glc_e = None
									for m in yeast_model.metabolites:
									if "glc" in m.id.lower() and (m.compartment or "").lower() in ("e","_e"):
									glc_e = m; break
									if glc_e is None:
									if "glc__D_e" not in yeast_model.metabolites:
									glc_e = Metabolite("glc__D_e", name="D-Glucose (extracellular)", compartment="e")
									yeast_model.add_metabolites([glc_e]); print("✅ 新建代谢物 glc__D_e (extracellular)")
									else:
									glc_e = yeast_model.metabolites.get_by_id("glc__D_e")
									ex_glc_present = any((("ex_glc__d_e" in rxn.id.lower() or "ex_glc" in rxn.id.lower()) and glc_e in rxn.metabolites) for
									rxn in yeast_model.reactions)
									if not ex_glc_present and "EX_glc__D_e" not in yeast_model.reactions:
									exg = Reaction("EX_glc__D_e"); exg.name = "Exchange reaction for D-Glucose"
									exg.lower_bound = -10.0; exg.upper_bound = 1000.0
									exg.add_metabolites({glc_e:-1}); yeast_model.add_reactions([exg])
									print("✅ 新建 exchange: EX_glc__D_e")
									print("🔎 半乳糖/葡萄糖通路检查完成（若模型已有则无新增）")
									
									def detect_key_reactions(community_model, lacto_model, yeast_model):
									print("\n--- Diagnostic: detect ATP maintenance (community, prefer suffix _y) ---")
									candidates = []
									for r in community_model.reactions:
									lid = r.id.lower(); lname = (r.name or "").lower()
									if "atp" in lid or "atpm" in lid or "maintenance" in lname or "non-growth" in lname:
									candidates.append((r.id, r.reaction))
									pick = None
									for cid, rxn in candidates:
									if cid.endswith("_y") or "_y" in cid:
									pick = cid; break
									if pick is None and candidates:
									pick = candidates[0][0]
									if pick:
									print("Detected ATP maintenance candidate:", pick)
									else:
									print("No ATP maintenance candidate found in community (you may need to specify one manually).")
									
									print("\n--- Diagnostic: lacto galactose-related reactions (in original lacto model) ---")
									gal_rxns = []
									for r in lacto_model.reactions:
									if "gal" in r.id.lower() or (r.name and "gal" in r.name.lower()):
									gal_rxns.append((r.id, r.name, r.reaction))
									if gal_rxns:
									for it in gal_rxns:
									print(it)
									else:
									print("No lacto reactions with 'gal' found.")
									print("\n--- Diagnostic: lacto metabolites with 'gal' ---")
									for m in lacto_model.metabolites:
									if "gal" in m.id.lower() or (m.name and "gal" in m.name.lower()):
									print(m.id, m.name, m.compartment)
									return pick, gal_rxns
									
									
									# ------------------ Start main flow ------------------
									print("Loading models...")
									yeast_orig = safe_load_sbml(YEAST_SBML_PATH)
									lacto_orig = safe_load_sbml(LAC_SBML_PATH)
									print("Yeast:", yeast_orig.id, " Reactions:", len(yeast_orig.reactions))
									print("Lacto:", lacto_orig.id, " Reactions:", len(lacto_orig.reactions))
									
									# target keys mapping
									target_keys = {
									"galactose":["galactose","gal"],
									"lactate":["lactate","lac","lac__L"],
									"acetate":["acetate","ac"],
									"amino":["glutamate","glutamine","amino","aa"],
									"riboflavin":["riboflavin","ribflv","b2"],
									"rf":["reactivation","rf"],
									"co2":["co2"]
									}
									
									yeast_map = {}
									lacto_map = {}
									print("\nTrying auto-map metabolites (if None => fill manually):")
									for k, kws in target_keys.items():
									m = find_best_met(yeast_orig, kws)
									yeast_map[k] = m.id if m else None
									m2 = find_best_met(lacto_orig, kws)
									lacto_map[k] = m2.id if m2 else None
									print(f" {k:10s} yeast -> {yeast_map[k]} \t lacto -> {lacto_map[k]}")
									
									ensure_galactose_utilization(yeast_orig)
									
									# build community
									yeast = deepcopy(yeast_orig); lacto = deepcopy(lacto_orig)
									add_suffix_inplace(yeast, "_y"); add_suffix_inplace(lacto, "_l")
									community = Model("community_Y_L")
									community.add_metabolites(list(yeast.metabolites))
									community.add_reactions(list(yeast.reactions))
									community.add_metabolites(list(lacto.metabolites))
									community.add_reactions(list(lacto.reactions))
									
									iface = community.solver.interface
									Variable = iface.Variable
									Constraint = iface.Constraint
									
									pool_keys = ["galactose","lactate","acetate","amino","riboflavin","co2","rf"]
									for key in pool_keys:
									pid=f"com_{key}_e"
									if pid not in [m.id for m in community.metabolites]:
									community.add_metabolites([Metabolite(pid, name=f"Community {key}", compartment="e")])
									
									def get_internal_in_comm(orig_id, suffix, virt_prefix):
									if orig_id:
									cid = f"{orig_id}{suffix}"
									if cid in community.metabolites:
									return community.metabolites.get_by_id(cid), False
									vid = f"{virt_prefix}_virtual_internal_c"
									if vid in community.metabolites:
									return community.metabolites.get_by_id(vid), True
									virt = Metabolite(vid, name=f"{virt_prefix} virtual internal", compartment="c")
									community.add_metabolites([virt])
									return virt, True
									
									for key in ["galactose","lactate","acetate"]:
									lp_id = lacto_map.get(key); y_id = yeast_map.get(key)
									lp_met, lp_virt = get_internal_in_comm(lp_id, "_l", f"L_{key}")
									y_met, y_virt = get_internal_in_comm(y_id, "_y", f"Y_{key}")
									com_met = community.metabolites.get_by_id(f"com_{key}_e")
									if lp_met and com_met:
									r1_id = f"L_to_com_{key}"
									if r1_id in community.reactions:
									community.remove_reactions([community.reactions.get_by_id(r1_id)])
									r1 = Reaction(r1_id); r1.add_metabolites({lp_met:-1.0, com_met:1.0}); r1.lower_bound=0.0; r1.upper_bound=1000.0
									community.add_reactions([r1])
									if y_met and com_met:
									r2_id = f"com_to_Y_{key}"
									if r2_id in community.reactions:
									community.remove_reactions([community.reactions.get_by_id(r2_id)])
									r2 = Reaction(r2_id); r2.add_metabolites({com_met:-1.0, y_met:1.0}); r2.lower_bound=0.0; r2.upper_bound=1000.0
									community.add_reactions([r2])
									
									for key in ["amino","riboflavin","co2","rf"]:
									y_id = yeast_map.get(key); lp_id = lacto_map.get(key)
									y_met, y_virt = get_internal_in_comm(y_id, "_y", f"Y_{key}")
									lp_met, lp_virt = get_internal_in_comm(lp_id, "_l", f"L_{key}")
									com_met = community.metabolites.get_by_id(f"com_{key}_e")
									if y_met and com_met:
									r1_id = f"Y_to_com_{key}"
									if r1_id in community.reactions:
									community.remove_reactions([community.reactions.get_by_id(r1_id)])
									r1 = Reaction(r1_id); r1.add_metabolites({y_met:-1.0, com_met:1.0}); r1.lower_bound=0.0; r1.upper_bound=1000.0
									community.add_reactions([r1])
									if lp_met and com_met:
									r2_id = f"com_to_L_{key}"
									if r2_id in community.reactions:
									community.remove_reactions([community.reactions.get_by_id(r2_id)])
									r2 = Reaction(r2_id); r2.add_metabolites({com_met:-1.0, lp_met:1.0}); r2.lower_bound=0.0; r2.upper_bound=1000.0
									community.add_reactions([r2])
									
									def set_ex_env(key, lower_bound):
									pid = f"com_{key}_e"
									exid = f"EX_{pid}"
									if exid not in community.reactions:
									met = community.metabolites.get_by_id(pid)
									exr = Reaction(exid); exr.add_metabolites({met:-1.0}); exr.lower_bound = lower_bound; exr.upper_bound=1000.0
									community.add_reactions([exr])
									else:
									community.reactions.get_by_id(exid).lower_bound = lower_bound
									
									for k in pool_keys:
									set_ex_env(k, 0.0)
									
									def safe_set_bounds(rxn, new_lb=None, new_ub=None, verbose=False):
									cur_lb, cur_ub = rxn.lower_bound, rxn.upper_bound
									lb = cur_lb if new_lb is None else float(new_lb)
									ub = cur_ub if new_ub is None else float(new_ub)
									if lb > ub:
									ub = lb
									if verbose:
									print(f"[safe_set_bounds] raised ub to {ub} to satisfy lb<=ub for {rxn.id}") rxn.upper_bound=ub rxn.lower_bound=lb def
										show_comm_exchanges(model): rows=[] for rxn in model.reactions: if rxn.id.startswith("EX_com_"):
										rows.append((rxn.id, rxn.lower_bound, rxn.upper_bound)) for rid, lb, ub in sorted(rows): print(f"{rid:28s}
										lb={lb:8g} ub={ub:8g}") def set_comm_medium_from_dict(model, pool_supply_dict, anaerobic=True,
										close_member_direct=False, close_member_keys=None, verbose=True): for key in pool_keys: set_ex_env(key, 0.0) for
										key, rate in pool_supply_dict.items(): if key not in pool_keys: if verbose: print(f"[skip] '{key}' 不在
										pool_keys，忽略。可选: {pool_keys}") continue set_ex_env(key, -abs(rate)) if verbose: print(f"ENV supply: {key:<12s} ->
										EX_com_{key}_e.lb = {-abs(rate)}")
										if anaerobic:
										for rid in ("EX_o2_e_y", "EX_o2_e_l", "EX_o2_e"):
										if rid in model.reactions:
										r = model.reactions.get_by_id(rid)
										safe_set_bounds(r, new_lb=0.0, new_ub=max(0.0, r.upper_bound), verbose=verbose)
										if verbose:
										print(f"Set {rid} lb = 0.0 (anaerobic) [ub was adjusted if needed]")
										if close_member_direct:
										keys_to_close = close_member_keys if close_member_keys is not None else pool_keys
										for rxn in model.reactions:
										rid = rxn.id
										if not (rid.startswith("EX_") and (rid.endswith("_y") or rid.endswith("_l"))):
										continue
										rid_low = rid.lower()
										should_close = False
										for k in keys_to_close:
										if k.lower() in rid_low or f"_{k.lower()}_" in rid_low or f"_{k.lower()}__" in rid_low:
										should_close = True
										break
										if should_close and rxn.lower_bound < 0: safe_set_bounds(rxn, new_lb=0.0, new_ub=max(0.0, rxn.upper_bound),
											verbose=verbose) if verbose: print(f"Close member direct exchange: {rid} (lb -> 0.0, ub adjusted if needed)")
									
											gut_like_pool = {"galactose":0.10, "lactate":0.2, "acetate":0.3}
											print("\n--- Set environment: gut-like medium (via community pool) ---")
											set_comm_medium_from_dict(community, gut_like_pool, anaerobic=True, close_member_direct=False, verbose=True)
											print("\nCommunity pool exchanges after applying medium:")
											show_comm_exchanges(community)
									
											# ---------- fix UDP-gal mapping (if needed) ----------
											lactogal_orig = None
											if "galactose" in lacto_map:
											lactogal_orig = lacto_map.get("galactose")
											if lactogal_orig and "udp" in (lactogal_orig or "").lower():
											udpgal_comm_id = f"{lactogal_orig}_l"
											gal_l_internal_id = "gal_L_internal_c"
											if gal_l_internal_id not in [m.id for m in community.metabolites]:
											gal_l_internal = Metabolite(gal_l_internal_id, name="Lactococcus free gal internal", compartment="c")
											community.add_metabolites([gal_l_internal])
											else:
											gal_l_internal = community.metabolites.get_by_id(gal_l_internal_id)
											conv_id = "L_convert_udpgal_to_gal"
											if conv_id in community.reactions:
											community.remove_reactions([community.reactions.get_by_id(conv_id)])
											if udpgal_comm_id in community.metabolites:
											conv = Reaction(conv_id)
											conv.add_metabolites({community.metabolites.get_by_id(udpgal_comm_id):-1.0, gal_l_internal:1.0})
											conv.lower_bound = 0.0; conv.upper_bound = 1000.0
											community.add_reactions([conv])
											r_old = "L_to_com_galactose"
											com_gal = community.metabolites.get_by_id("com_galactose_e")
											if r_old in community.reactions:
											community.remove_reactions([community.reactions.get_by_id(r_old)])
											rnew = Reaction("L_to_com_galactose")
											rnew.add_metabolites({gal_l_internal:-1.0, com_gal:1.0})
											rnew.lower_bound = 0.0; rnew.upper_bound=1000.0
											community.add_reactions([rnew])
											print("Added conversion L_convert_udpgal_to_gal and recreated L_to_com_galactose using virtual gal internal.")
											else:
											print("Warning: expected udpgal internal metabolite not found in community; conversion not added.")
									
											# ---------- biomass ids ----------
											y_biomass_orig = find_biomass_id(yeast_orig)
											l_biomass_orig = find_biomass_id(lacto_orig)
											y_biomass_comm = f"{y_biomass_orig}_y" if y_biomass_orig else None
											l_biomass_comm = f"{l_biomass_orig}_l" if l_biomass_orig else None
											print("\nDetected biomass reactions (orig -> community):")
											print(" yeast:", y_biomass_orig, "->", y_biomass_comm)
											print(" lacto:", l_biomass_orig, "->", l_biomass_comm)
									
											# ---------- Step 1: maximize community biomass (baseline) ----------
											print("\n--- Step 1: maximize community biomass (baseline) ---")
											if not (l_biomass_comm and y_biomass_comm and l_biomass_comm in community.reactions and y_biomass_comm in
											community.reactions):
											raise RuntimeError("找不到 biomass reactions，请确认模型及映射。")
											biomass_expr = community.reactions.get_by_id(l_biomass_comm).flux_expression +
											community.reactions.get_by_id(y_biomass_comm).flux_expression
											community.objective = biomass_expr
											sol_growth = community.optimize()
											print("Status (growth-only):", sol_growth.status)
											if sol_growth.status == 'optimal':
											print(" Growth-only community objective:", sol_growth.objective_value)
											print(" Lacto growth:", sol_growth.fluxes.get(l_biomass_comm, None))
											print(" Yeast growth:", sol_growth.fluxes.get(y_biomass_comm, None))
									
											# ---------- Step 2: FVA on exchange reactions (to pick coupling set) ----------
											print("\n--- Step 2: FVA on exchange reactions (early) ---")
											exchange_rxns = []
											for key in ["galactose","lactate","acetate","amino","riboflavin","co2","rf"]:
											for prefix in ("L_to_com_","com_to_Y_","Y_to_com_","com_to_L_"):
											rid = prefix + key
											if rid in community.reactions:
											exchange_rxns.append(rid)
											exchange_rxns = list(dict.fromkeys(exchange_rxns))
									
											fva = None
											if exchange_rxns:
											try:
											fva = flux_variability_analysis(community, reaction_list=exchange_rxns, fraction_of_optimum=0.99)
											print("FVA results (exchanges):")
											print(fva)
											except Exception as e:
											print("FVA failed:", e)
											fva = None
											else:
											print("No exchange reactions to FVA.")
									
											coupled_exchanges = []
											if fva is not None:
											for rid, row in fva.iterrows():
											if row["maximum"] > 1e-9:
											coupled_exchanges.append(rid)
											if not coupled_exchanges:
											for rid in ["com_to_Y_lactate", "com_to_Y_galactose", "Y_to_com_amino", "com_to_L_amino", "com_to_Y_acetate"]:
											if rid in community.reactions:
											coupled_exchanges.append(rid)
											print("\nCoupled exchanges:", coupled_exchanges if coupled_exchanges else "[]")
									
											# ---------- Step 3: add linear coupling constraints v_ATPM >= max(min_lb, base_lb*(1 - k*sum|v|)) ----------
											print("\n--- Step 3: add linear ATPM–exchange coupling constraints ---")
											ATPM_Y_ID, _ = detect_key_reactions(community, lacto_orig, yeast_orig)
											if ATPM_Y_ID is None or ATPM_Y_ID not in community.reactions:
											raise RuntimeError("Yeast ATPM reaction not found in community. 请检查 detect_key_reactions 输出并手动指定 ATPM reaction
											id。")
											r_atpm = community.reactions.get_by_id(ATPM_Y_ID)
											v_atpm = r_atpm.flux_expression
											base_lb = float(r_atpm.lower_bound)
											min_fraction = 0.15
											min_lb = base_lb * min_fraction
									
											# linearize |v_i|
											def safe_name(s):
											return ("abs_" + s).replace("-", "_").replace(".", "_")
									
											abs_vars = {}
											for rid in coupled_exchanges:
											rxn = community.reactions.get_by_id(rid)
											v_i = rxn.flux_expression
											tname = safe_name(rid)
											if tname in community.solver.variables:
											t_i = community.solver.variables[tname]
											else:
											t_i = Variable(name=tname, lb=0.0, ub=1000.0)
											community.solver.add(t_i)
											c1_name = tname + "_ge_pos"
											c2_name = tname + "_ge_neg"
											if c1_name not in community.solver.constraints:
											community.solver.add(Constraint(t_i - v_i, lb=0.0, ub=None, name=c1_name))
											if c2_name not in community.solver.constraints:
											community.solver.add(Constraint(t_i + v_i, lb=0.0, ub=None, name=c2_name))
											abs_vars[rid] = t_i
									
											sum_t = None
											for t in abs_vars.values():
											sum_t = t if sum_t is None else (sum_t + t)
									
											COUPLING_K = 0.5
									
											# --- NEW: relax the ATPM reaction lower bound to min_lb so coupling constraint can be active ---
											print("\n--- Relaxing ATPM lower bound so coupling can take effect ---")
											print(" original base_lb:", base_lb, " computed min_lb:", min_lb)
											# ensure ub >= min_lb
											if r_atpm.upper_bound < min_lb: r_atpm.upper_bound=min_lb r_atpm.lower_bound=float(min_lb) print(" r_atpm bounds
												now=> lb:", r_atpm.lower_bound, " ub:", r_atpm.upper_bound)
									
												# Constraint: v_ATPM >= min_lb (still keep for safety) - use unique name
												if "ATPM_min_lb" not in community.solver.constraints:
												community.solver.add(Constraint(v_atpm, lb=min_lb, ub=None, name="ATPM_min_lb"))
									
												# Constraint: v_ATPM >= base_lb*(1 - k*sum_t)
												rhs_expr = base_lb * (1.0 - float(COUPLING_K) * (sum_t if sum_t is not None else 0))
												if "ATPM_coupled_lb" not in community.solver.constraints:
												community.solver.add(Constraint(v_atpm - rhs_expr, lb=0.0, ub=None, name="ATPM_coupled_lb"))
									
												print(f"Coupling in place: v_ATPM >= max({min_lb:.4g}, {base_lb:.4g}*(1 - {COUPLING_K}*sum|v_exch|))")
									
												# DIAGNOSTIC checks before optimizing
												print("\n--- DIAGNOSTIC (pre-opt) ---")
												print("coupled_exchanges:", coupled_exchanges)
												print("abs_var count:", len(abs_vars))
												print("solver constraints include ATPM_min_lb, ATPM_coupled_lb?:", "ATPM_min_lb" in
												community.solver.constraints, "ATPM_coupled_lb" in community.solver.constraints)
									
												# ---------- Step 4: optimize biomass with coupling active ----------
												print("\n--- Step 4: optimize biomass under coupled ATPM ---")
												community.objective = community.reactions.get_by_id(y_biomass_comm).flux_expression +
												community.reactions.get_by_id(l_biomass_comm).flux_expression
												sol_final = community.optimize()
												print("Status:", sol_final.status)
												summary_row = {
												"baseline_obj": float(sol_growth.objective_value) if sol_growth.status == "optimal" else np.nan,
												"baseline_yeast": float(sol_growth.fluxes.get(y_biomass_comm, np.nan)) if sol_growth.status == "optimal"
												else np.nan,
												"baseline_lacto": float(sol_growth.fluxes.get(l_biomass_comm, np.nan)) if sol_growth.status == "optimal"
												else np.nan,
												"final_obj": np.nan,
												"final_yeast": np.nan,
												"final_lacto": np.nan,
												"sum_abs_exch": np.nan,
												"implied_atpm_lb": np.nan
												}
									
												if sol_final.status == "optimal":
												y_growth = float(sol_final.fluxes.get(y_biomass_comm, np.nan))
												l_growth = float(sol_final.fluxes.get(l_biomass_comm, np.nan))
												summary_row["final_obj"] = float(sol_final.objective_value)
												summary_row["final_yeast"] = y_growth
												summary_row["final_lacto"] = l_growth
												print(f" Yeast growth: {y_growth:.6g}")
												print(f" Lacto growth: {l_growth:.6g}")
									
												exch_vals = []
												for rid in coupled_exchanges:
												val = float(sol_final.fluxes.get(rid, 0.0)) if rid in sol_final.fluxes else 0.0
												exch_vals.append(abs(val))
												print(f" {rid:25s} = {val:.6g}")
												sum_abs = sum(exch_vals)
												implied_lb = max(min_lb, base_lb*(1 - COUPLING_K*sum_abs))
												summary_row["sum_abs_exch"] = sum_abs
												summary_row["implied_atpm_lb"] = implied_lb
												print(f" sum|v_exch| = {sum_abs:.6g} ⇒ implied ATPM lb ≥ {implied_lb:.6g} (base={base_lb:.6g})")
												else:
												print("Final optimization not optimal; no final metrics recorded.")
									
												# save community SBML (optional)
												try:
												io.write_sbml_model(community, OUTPUT_COMMUNITY_SBML)
												print("\nSaved community SBML to:", OUTPUT_COMMUNITY_SBML)
												except Exception as e:
												print("Saving community SBML failed:", e)
									
												# save CSV summary
												try:
												df = pd.DataFrame([summary_row])
												df.to_csv(OUTPUT_SUMMARY_CSV, index=False)
												print("Saved summary CSV to:", OUTPUT_SUMMARY_CSV)
												except Exception as e:
												print("Saving summary CSV failed:", e)
									
												print("\nScript finished.")

Reference

1. Belenguer, A., Duncan, S. H., Holtrop, G., Anderson, S. E., Lobley, G. E., & Flint, H. J. (2006). Effect of pH on lactate utilization by human colonic bacteria. Applied and Environmental Microbiology, 72(5), 3138–3144.
Wang SP, Rubio LA, Duncan SH, Donachie GE, Holtrop G, Lo G, Farquharson FM, Wagner J, Parkhill J, Louis P, Walker AW, Flint HJ. 2020. Pivotal Roles for pH, Lactate, and Lactate-Utilizing Bacteria in the Stability of a Human Colonic Microbial Ecosystem. mSystems 5:10.1128/msystems.00645-20.

Part5

PET Target Collection

Selection of IBD-Related Targets

Analysis of Common Potential Targets Between PET Targets and IBD Targets

Enrichment Analysis

Pathway Enrichment Analysis of Core Targets

GEO Database - Obtaining Chip Data

limma Differential Analysis

WGCNA (Weighted Gene Co-expression Network Analysis)

Screening of Core Targets by Machine Learning Algorithms

Molecular Docking: Referring to the previous docking content, molecular docking of PET with the core target proteins of IBD

Molecular Dynamics Simulation

Construction of the Adverse Outcome Pathway (AOP) System

Reference

1 PET Target Collection

(1)Searched the PubChem database for "Bis(2-hydroxyethyl) terephthalate" to identify the structure and SMILES notation of PET.

(2)Retrieved potential PET targets from the ChEMBL database (Keyword: "Bis(2-hydroxyethyl) terephthalate"), filtering for "Homo sapiens".

(3)Submitted PET's SMILES code to the STITCH database for further mining of potential targets.

(4)Standardized target names using the UniProt database

(5)Constructed a target database by merging and validating targets.

Click to expand ▼


# 1. 设置文件路径
file1 <- "chembl gene.txt"
file2 <- "STITCH gene.txt"

# 2. 读入基因列表（每行一个基因符号）
genes1 <- readLines(file1, encoding = "UTF-8")
genes2 <- readLines(file2, encoding = "UTF-8")

# 去除空行和重复
genes1 <- unique(na.omit(trimws(genes1)))
genes2 <- unique(na.omit(trimws(genes2)))

# 3. 计算交集和并集
common_genes <- intersect(genes1, genes2)
compound_genes <- union(genes1, genes2)

# 4. 差集：仅在各自列表中的基因
only_in_chembl <- setdiff(genes1, genes2)
only_in_STITCH <- setdiff(genes2, genes1)

# 5. 保存结果到文件
writeLines(compound_genes,    "compound_genes_union.txt")


# 6. 控制台打印各集合大小
cat("chembl 列表基因数：",       length(genes1),      "\n")
cat("STITCH 列表基因数：",       length(genes2),      "\n")
cat("交集（common）数：",        length(common_genes),"\n")
cat("并集（compound）数：",      length(compound_genes),"\n")
cat("仅 chembl 特有：",         length(only_in_chembl),"\n")
cat("仅 STITCH 特有：",         length(only_in_STITCH),"\n")

2 Selection of IBD-Related Targets

(1) Utilizing the GeneCards and OMIM databases, IBD-related targets were searched. Genes with "score" values restricted to the median level or higher were screened, and those with high "score" values were selected to establish an IBD target gene library.

Click to expand ▼


	# 1. 设置文件路径
	file1 <- "genecard.txt"
	file2 <- "OMIM靶基因.txt"
	
	# 2. 读入基因列表（每行一个基因符号）
	genes1 <- readLines(file1)
	genes2 <- readLines(file2)
	
	# 去除空行和重复
	genes1 <- unique(na.omit(trimws(genes1)))
	genes2 <- unique(na.omit(trimws(genes2)))
	
	# 3. 计算交集和并集
	common_genes <- intersect(genes1, genes2)
	all_genes    <- union(genes1, genes2)
	
	# 4. 计算差集
	only_in_1 <- setdiff(genes1, genes2)
	only_in_2 <- setdiff(genes2, genes1)
	
	# 5. 将结果保存到磁盘
	writeLines(all_genes,    "compound_genes_union.txt")
	
	# 6. 简要输出各集合大小
	cat("genecard 列表基因数：", length(genes1), "\n")
	cat("OMIM 列表基因数：",    length(genes2), "\n")
	cat("交集（common）数：",    length(common_genes), "\n")
	cat("并集（union）数：",     length(all_genes), "\n")
	cat("仅 genecard 特有数：", length(only_in_1), "\n")
	cat("仅 OMIM 特有数：",      length(only_in_2), "\n")

3 Analysis of Common Potential Targets Between PET Targets and IBD Targets

Identify the overlapping region as a potential therapeutic target for PET imaging, with specific relevance to inflammatory bowel disease (IBD).

Click to expand ▼


# 1. 定义文件路径（请根据实际路径修改）
disease_file  <- "compound_genes.txt"          # 疾病相关基因列表
compound_file <- "compound_genes化合物.txt"   # 化合物相关基因列表

# 2. 读入基因列表
genes_disease  <- readLines(disease_file,  encoding = "UTF-8")
genes_compound <- readLines(compound_file, encoding = "UTF-8")

# 3. 清洗：去除空行、首尾空白和重复
genes_disease  <- unique(na.omit(trimws(genes_disease)))
genes_compound <- unique(na.omit(trimws(genes_compound)))

# 4. 计算交集
common_genes <- intersect(genes_disease, genes_compound)

# 5. 将交集写出到文件
writeLines(common_genes, "intersection_genes.txt")

# 6. 控制台打印交集大小
cat("疾病列表基因数：", length(genes_disease),  "\n")
cat("化合物列表基因数：", length(genes_compound), "\n")
cat("交集基因数：",       length(common_genes), "\n")

4 Construction of Protein Interaction Network (PPI) and Screening of Core Targets

Utilizing the STRING database and restricting the species to "Homo sapiens", the "minimum required interaction score" was set to "medium confidence >0.4" for analysis. The results generated by STRING were imported into network biology visualization analysis, and Cytoscape was used to calculate the parameters of each node in the network diagram to display molecular connections. The CytoHubba_MCC algorithm was used to visually analyze the network and select core targets. Through module analysis with the MCODE plugin, it was found that 10 key genes were located in the most critical module.

（1）PPI network visualization

(2) Cytoscape visualization

The MCODE (Molecular Complex Detection) algorithm is used to identify functional modules or protein complexes from PPI networks, which typically represent clusters of proteins that work together in cells. Utilizing the STRING database, we meticulously constructed a comprehensive protein-protein interaction (PPI) network. This network contains numerous nodes and edges, revealing the complex interaction relationships among proteins. Subsequently, the Cytoscape software was employed to conduct a detailed analysis of this network, further revealing the topological features of each node and generating an optimized and intuitive visualization graph . These graphs not only display the interactions among the potential targets identified in the study but also visually convey the importance of each node in the network through the size and color of the nodes. Larger nodes and darker shades indicate higher degree values, highlighting their crucial roles in the network.

Through further analysis of this network, we identified several key proteins:

IL2: It has strong interactions with multiple proteins and may play a significant role in immune system or cell proliferation-related pathways.

BDNF: It is associated with neuroprotection and neurogenesis, and may play a key role in the nervous system.

BCL2: It plays a crucial role in regulating apoptosis and may be related to cell survival signals. JAK2: It is involved in cell signaling, particularly in the JAK-STAT pathway, and may be involved in immune responses and cell growth.

TGFBR2(Transforming Growth Factor-β Receptor Type 2): TGFBR2 is a key receptor in the TGF-β signaling pathway. After binding with TGF-β, it initiates downstream signaling, regulating cell proliferation, differentiation, migration, and apoptosis. The activation of TGFBR2 plays an important role in tumor suppression, immune responses, and tissue repair.

TGFB2 (Transforming Growth Factor-β2): TGFB2 is a member of the TGF-β family and has similar functions to TGF-β. By binding with TGFBR2 and other receptors, it regulates cell behavior. TGFB2 plays a crucial role in the immune system, development processes, and tissue repair. It also has an important role in diseases such as cancer and fibrosis.

They play important roles in cell survival, apoptosis, and immune responses. We particularly highlighted the interactions among these core nodes, and through focused network graphs clearly demonstrated their functional relationships and interdependencies in complex biological processes. Notably, IL2 and BCL2 have significant centrality in the network, as indicated by their larger sizes and deep red markings, highlighting their core roles in the interaction network.

From the figures, the key nodes can be observed:

5 Enrichment Analysis

(1) GO enrichment analysis of intersection genes

functions of PET particles in the context of IBD. GO analysis was conducted using R, including a comprehensive assessment of biological processes (BP), cellular components (CC), and molecular functions (MF).

① The bubble chart shows the enrichment status, with the x-axis representing the gene proportion, indicating the enrichment degree of a certain GO term in a specific gene set.

·BP (Biological Process): It shows that the intersection genes related to PET microplastics and IBD are enriched in "inflammatory response", "immune response", and "cellular stress response". This suggests that PET may interfere with the proliferation of immune cells, exacerbating IBD and affecting the intestinal mucosal repair of IBD.

·CC (Cellular Component): It reveals the location of the intersection genes in the cell, suggesting the possible physical impact of PET microplastics on cells. It reflects that PET may damage the integrity of the intestinal barrier and disrupt ion balance. ·MF (Molecular Function): This module may reveal the molecular functions involved in these intersection genes, possibly indicating that PET disrupts lipid metabolism and promotes inflammation through DNA damage.

② The bar chart's x-axis represents the enrichment factor, which is the ratio of the proportion of related genes in a certain GO term to the overall proportion of the gene set. Here, it can be seen that PET exposure may lead to intestinal nervous system disorders, aggravating abdominal pain, or may affect central glial cells through the vagus nerve, forming a bidirectional regulatory disorder of the "brain-gut axis". The enrichment of secondary lysosomes may reflect the endocytic clearance mechanism of microplastic particles.

Chord Diagram Analysis:

Lipid Metabolism-Inflammation Axis:

PLA2G5 is strongly associated with the phosphoric acid metabolic process (deep red string, -log10(P) = 9.2). The phospholipase A2 it encodes is the rate-limiting enzyme in prostaglandin synthesis and may mediate microplastic-induced intestinal inflammation.

TGF (transforming growth factor) simultaneously links the positive regulation of cell proliferation (orange string) and the stress response (blue string), suggesting its dual role in intestinal epithelial repair and fibrosis.

Ion Channel-Barrier Disruption:

CLCN2 (chloride channel) co-occurs with cation transmembrane transport (purple string) and cell death (black string), and may cause apoptosis (such as loss of goblet cells) by disrupting the ion gradient of intestinal epithelial cells.

KCN2 (potassium channel) is associated with the response to external stimuli (green string, -log10(P) = 8.5), or is involved in the toxicity sensing mechanism of microplastic particles.

(2) KEGG pathway enrichment analysis

To improve the identification of relevant targets and enhance the mechanism of action of PET for therapeutic and potential toxic pathways.

Exposure to microplastics (PET) drives intestinal barrier disruption through the PI3K-Akt-FoxO axis, and in concert with Jak-STAT-mediated immune imbalance and LOX/COX2 metabolic disorder, jointly exacerbates the progression of IBD.

Click to expand ▼


# —— 1. 读入交集基因并去重 —— 
> genes <- readLines("intersection_genes.txt", encoding = "UTF-8")
> genes <- unique(na.omit(stri_trim_both(genes)))
> 
> # —— 2. SYMBOL 转 EntrezID —— 
> gene_df <- bitr(genes,
				+                 fromType = "SYMBOL",
				+                 toType   = "ENTREZID",
				+                 OrgDb    = org.Hs.eg.db)
'select()' returned 1:1 mapping between keys and columns
> entrez_ids <- gene_df$ENTREZID
> 
> # —— 3. enrichGO 分析 —— 
> ego <- enrichGO(gene          = entrez_ids,
					+                 OrgDb         = org.Hs.eg.db,
					+                 ont           = c("BP","CC","MF"),  # 一次算三类
					+                 pAdjustMethod = "BH",
					+                 qvalueCutoff  = 0.05,
					+                 readable      = TRUE)
错误于match.arg(ont, c("BP", "MF", "CC", "ALL")): 'arg'的长度必需为一
> # —— 加载包 —— 
> library(stringi)
> library(dplyr)
> library(tidyr)
> library(forcats)
> library(clusterProfiler)
> library(org.Hs.eg.db)
> library(ggplot2)
> 
> # —— 1. 读入基因列表 —— 
> genes <- readLines("intersection_genes.txt", encoding = "UTF-8") %>%
+     stri_trim_both() %>%
+     na.omit() %>%
+     unique()
> 
> # —— 2. SYMBOL 转 EntrezID —— 
> gene_df   <- bitr(genes,
					+                   fromType = "SYMBOL",
					+                   toType   = "ENTREZID",
					+                   OrgDb    = org.Hs.eg.db)
'select()' returned 1:1 mapping between keys and columns
> entrez_ids <- gene_df$ENTREZID
> 
> # —— 3. 分本体跑 enrichGO 并收集结果 —— 
> onts <- c("BP","CC","MF")
> all_res <- lapply(onts, function(ont){
+     message("EnrichGO for ", ont)
+     ego <- enrichGO(gene          = entrez_ids,
						+                     OrgDb         = org.Hs.eg.db,
						+                     ont           = ont,            # 单一本体
						+                     pAdjustMethod = "BH",
						+                     qvalueCutoff  = 0.05,
						+                     readable      = TRUE)
+     # 加一列标记本体
	+     df <- ego@result
	+     df$ONTOLOGY <- ont
	+     df
	+ })
EnrichGO for BP
EnrichGO for CC
EnrichGO for MF
> go_result <- bind_rows(all_res)
> 
> # —— 4. 拆分 GeneRatio 并取每类前 10 —— 
> go_df <- go_result %>%
+     separate(col   = GeneRatio,
				+              into  = c("GR1","GR2"),
				+              sep   = "/",
				+              convert = TRUE) %>%
+     mutate(GeneRatio = GR1 / GR2) %>%
+     group_by(ONTOLOGY) %>%
+     slice_head(n = 10) %>%
+     ungroup()
> 
> # —— 5. 绘制分面气泡图 —— 
> p <- ggplot(go_df,
				+             aes(x = GeneRatio,
								+                 y = fct_reorder(Description, GeneRatio))) +
+     geom_point(aes(size = Count,
						+                    fill = p.adjust),
					+                shape = 21, color = "black") +
+     facet_grid(ONTOLOGY ~ .,
					+                scales = "free_y",
					+                space  = "free_y") +
+     scale_fill_gradient(low  = "#E27371",
							+                         high = "#5D82A7",
							+                         name = "Adj. P") +
+     labs(title = "GO Enrichment",
			+          x     = "GeneRatio",
			+          y     = "GO term") +
+     guides(fill = guide_colorbar(reverse = TRUE, order = 1),
				+            size = guide_legend(order = 2)) +
+     theme_bw(base_size = 12) +
+     theme(
	+         plot.title   = element_text(face = "bold", hjust = 0.5),
	+         axis.text.y  = element_text(size = 10),
	+         axis.text.x  = element_text(size = 10),
	+         axis.title   = element_text(face = "bold", size = 12),
	+         strip.text.y = element_text(face = "bold", size = 11),
	+         legend.title = element_text(size = 11),
	+         legend.text  = element_text(size = 10)
	+     )
> 
> print(p)
> ggsave("GO_Enrichment_Bubble.pdf", p, width = 7, height = 8)

Click to expand ▼


# —— 1. 读入交集基因 & 转 EntrezID —— 
> genes <- readLines("intersection_genes.txt", encoding = "UTF-8") %>%
	+     stri_trim_both() %>%
	+     na.omit() %>%
	+     unique()
> gene_df   <- bitr(genes, fromType="SYMBOL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
'select()' returned 1:1 mapping between keys and columns
> entrez_ids <- gene_df$ENTREZID
> 
	> # —— 2. 分本体跑 enrichGO 并合并结果 —— 
	> onts <- c("BP","CC","MF")
> all_res <- lapply(onts, function(ont){
	+     ego <- enrichGO(gene          = entrez_ids,
						+                     OrgDb         = org.Hs.eg.db,
						+                     ont           = ont,
						+                     pAdjustMethod = "BH",
						+                     qvalueCutoff  = 0.05,
						+                     readable      = TRUE)
	+     df <- ego@result
	+     df$ONTOLOGY <- ont
	+     df
	+ })
> go_result <- bind_rows(all_res)
> 
	> # —— 3. 拆 Ratio、计算 Enrichment_Factor，取每类前 10 —— 
	> go_df <- go_result %>%
	+     # 拆分 GeneRatio 和 BgRatio
	+     separate(col = GeneRatio, into = c("GR1","GR2"), sep = "/", convert = TRUE) %>%
	+     separate(col = BgRatio,   into = c("BR1","BR2"), sep = "/", convert = TRUE) %>%
	+     # 计算数值
	+     mutate(
	+         GeneRatio         = GR1 / GR2,
	+         BgRatio           = BR1 / BR2,
	+         Enrichment_Factor = GeneRatio / BgRatio
	+     ) %>%
	+     # 每个本体取前 10（按 p.adjust 或 Enrichment_Factor 都可以）
	+     group_by(ONTOLOGY) %>%
	+     slice_head(n = 10) %>%
	+     ungroup()
> 
	> # —— 4. 绘制分面柱状图 —— 
	> p_bar <- ggplot(go_df,
					+                 aes(x = Enrichment_Factor,
											+                     y = fct_reorder(Description, Enrichment_Factor))) +
	+     geom_col(aes(fill = p.adjust)) +
	+     facet_grid(ONTOLOGY ~ .,
					+                scales = "free_y",
					+                space  = "free_y") +
	+     scale_fill_gradient(low  = "#E27371",
							+                         high = "#5D82A7",
							+                         name = "Adj. P") +
	+     labs(title = "GO Enrichment",
				+          x     = "Enrichment Factor",
				+          y     = "GO term") +
	+     guides(fill = guide_colorbar(reverse = TRUE)) +
	+     theme_bw(base_size = 12) +
	+     theme(
	+         plot.title   = element_text(face = "bold", hjust = 0.5),
	+         axis.text.y  = element_text(size = 10),
	+         axis.text.x  = element_text(size = 10),
	+         axis.title   = element_text(face = "bold", size = 12),
	+         strip.text.y = element_text(face = "bold", size = 11),
	+         legend.title = element_text(size = 11),
	+         legend.text  = element_text(size = 10)
	+     )
> 
	> # 显示并保存
	> print(p_bar)
> ggsave("GO_Enrichment_Barplot.pdf", p_bar, width = 7, height = 8)

6 Pathway Enrichment Analysis of Core Targets

The 10 core targets were input into the DAVID database to conduct classic pathway analysis including KEGG enrichment analysis, and the top 20 enriched pathways with the lowest FDR values were visually displayed. ：

According to the -log10(p.adjust) values on the left bar chart, the core pathways are ranked by significance as follows:

Colorectal cancer (-log10(p.adj)=4.0)

2. AGE-RAGE Signaling Pathway in Diabetic Complications (3.8)

3. C-type lectin receptor signaling pathway(3.5)

4. PI3K-Akt signaling pathway(3.0)

5. Ras Signaling Pathway(2.5)

Microplastics (PET) may affect the progression of IBD through the following cascade reactions:

1. Oxidative stress: Activation of the AGE-RAGE pathway → NF-κB-mediated release of pro-inflammatory factors (TNF-α, IL-6).

2. Immune dysregulation: C-type lectin receptor signaling → NLRP3 inflammasome activation → secretion of IL-1β/IL-18.

3. Uncontrolled Proliferation: The PI3K-Akt/Ras pathway works in concert to inhibit apoptosis (BCL2↑) and drive proliferation (Cyclin D1↑).

4. Cancer Tendency: Inactivation of APC/TP53 and KRAS mutation → Progression of adenoma-carcinoma

sequence.

Click to expand ▼


# —— 0. 安装并加载必要的包 —— 
# install.packages(c("clusterProfiler","org.Hs.eg.db","enrichplot",
#                    "ggplot2","dplyr","pheatmap","gridExtra"))
# BiocManager::install("clusterProfiler")
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(ggplot2)
library(dplyr)
library(pheatmap)
library(gridExtra)

# —— 1. 读取交集基因列表 —— 
genes <- scan("intersection_genes.txt", what = character(), sep = "\n")
# 引用用户提供的基因列表文件 :contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}

# —— 2. SYMBOL 转 ENTREZID —— 
gene_df <- bitr(genes, fromType="SYMBOL", toType="ENTREZID", OrgDb=org.Hs.eg.db) %>%
	distinct(ENTREZID, .keep_all=TRUE)

# —— 3. KEGG 富集分析 —— 
kk <- enrichKEGG(gene=gene_df$ENTREZID, organism="hsa", pvalueCutoff=0.05)
kk <- setReadable(kk, OrgDb=org.Hs.eg.db, keyType="ENTREZID")

# —— 4. 取前 N 条显著通路 —— 
N <- 10
df_kk <- as.data.frame(kk@result, stringsAsFactors=FALSE)
df_kk$p.adjust    <- as.numeric(df_kk$p.adjust)
df_kk$Description <- as.character(df_kk$Description)
df_kk$geneID      <- as.character(df_kk$geneID)
df_kk <- df_kk[order(df_kk$p.adjust), ]
top_df <- df_kk[1:N, ]
top_df$Term <- factor(top_df$Description, levels=rev(top_df$Description))

# —— 5. 左侧柱状图：–log10(p.adjust)，宽度加大且从深到浅渐变 —— 
barplot_df <- transform(top_df, minusLogP = -log10(p.adjust))

p1 <- ggplot(barplot_df, aes(x=Term, y=minusLogP, fill=minusLogP)) +
	geom_col(width = 0.6) +   # 宽度由 0.6 改为 0.8
	scale_fill_gradientn(
	colours = c("#d7ddf0","#d0d5ec","#b7c1e3","#a9b6dd"),  # 深到浅
	name    = expression(-log[10](p.adjust))
	) +
	coord_flip() +
	labs(x=NULL, title=paste0("Top ", N, " KEGG Pathways")) +
	theme_bw(base_size=12, base_family="Times") +
	theme(
	panel.grid.major = element_line(color="grey90", size=0.2),
	panel.grid.minor = element_blank(),
	axis.line        = element_line(color="black"),
	axis.text.y      = element_text(size=10),
	plot.title       = element_text(face="bold", hjust=0.5)
	)

# —— 6. 右侧热图：通路–基因二值矩阵，指定新配色 —— 
gene_lists <- strsplit(top_df$geneID, "/")
all_genes  <- unique(unlist(gene_lists))
mat <- sapply(gene_lists, function(gl) as.integer(all_genes %in% gl))
mat <- t(mat)
rownames(mat) <- levels(top_df$Term)
colnames(mat) <- all_genes

# 从深到浅：#0f98b0 → #87cbd8 → #eaf5f8
heat_cols <- colorRampPalette(c("#b6b4d9","#e1e3f2"))(50)

p2 <- pheatmap(
	mat,
	cluster_rows    = FALSE,
	cluster_cols    = FALSE,
	show_colnames   = TRUE,
	show_rownames   = TRUE,
	fontsize_col    = 6,
	fontsize_row    = 8,
	color           = heat_cols,
	legend          = FALSE,
	border_color    = NA,
	main            = "Pathway–Gene Membership"
)

# —— 7. 拼合并保存 —— 
# 高度可调，例如 height=6, width=10
# tiff("KEGG_bar_heatmap.tiff", width=1000, height=600, res=150)
grid.arrange(p1, p2$gtable, ncol=2, widths=c(4,3))

7 GEO Database - Obtaining Chip Data

Chip data GSE206171 was obtained from the GEO database.

8 limma Differential Analysis

limma: Originally designed for microarray data, it has since been extended to RNA-Seq data analysis. It uses linear models for differential analysis and can easily handle multiple factors and covariates. The advantage of limma lies in its speed and flexibility, allowing it to handle a large number of genes and samples.

Compare the differences in transcriptomes to reveal changes in gene expression.

Click to expand ▼


library(tidyverse)
library(limma)
train_data <- read.csv('juzhen.csv')
group <- read.csv('group.csv')
table(group$group)
group$group <- factor(group$group, levels = c("disease","control"))
# 分组矩阵design
design <- cbind(control = ifelse(group$group == "control", 1, 0), 
				disease = ifelse(group$group == "disease", 1, 0))
contrast.matrix <- makeContrasts(contrasts = 'disease-control', levels = design)
fit <- lmFit(train_data, design) # 构建拟合模型，用来描述train_data中的数据是如何与design中的变量相关联的
fit2 <- contrasts.fit(fit, contrast.matrix) # 拟合模型的对比。对比方式是之前构建的对比矩阵：contrast.matrix（即：比较拟合模型中不同分组条件下基因表达是否存在差异）
fit2 <- eBayes(fit2 , trend = TRUE) # 是一种贝叶斯统计方法，为了让模型的参数估计更加准确，使它们更接近真实值
tempOutput <- topTable(fit2, coef = 1, n = nrow(fit2), adjust="fdr")  # 将差异结果转换成数据框
tempOutput <- na.omit(tempOutput) ## 去掉数据中有NA的行或列
DEG <- tempOutput
DEG$change <- as.factor(ifelse(DEG$P.Value < 0.05 & abs(DEG$logFC) >= 0.5, 
								ifelse(DEG$logFC > 0.5, 'Up', 'Down'), 'Not'))
DEG_write <- cbind(symbol = rownames(DEG), DEG)
sig_diff <- subset(DEG, DEG$P.Value < 0.05 & abs(DEG$logFC) >= 0.5)
sig_diff_write <- cbind(symbol = rownames(sig_diff), sig_diff)
write.csv(DEG_write, file = paste0('DEG_all.csv'))
write.csv(sig_diff_write, file = paste0('DEG_sig.csv'))

9 WGCNA (Weighted Gene Co-expression Network Analysis)

First, the expression matrix to be analyzed was imported and the data was processed. Then, we prepared a grouping information table for subsequent correlation analysis.

Build the sample clustering tree

Analyze the outlier samples by plotting with the plotDendroAndColors function.

Calculate the soft threshold.

Further construct the WGCNA network and draw the dynamic gene shearing tree.

Calculate the correlation between the modules and the traits, and draw a heat map of the correlation between the modules and the traits.

Red indicates a positive correlation, blue indicates a negative correlation, and the depth of the color represents the correlation coefficient.

Click to expand ▼


library(WGCNA)  
library(tidyverse)
library(dplyr)
library(ggplot2)
# 定义一个函数来清理并读取数据
dat <- read.csv('juzhen.csv', row.names = 1)
dat <- t(dat) %>% data.frame()

temp1 <- goodSamplesGenes(dat, verbose = 3) # 对文件做检测，检测是否有缺失值，即NA
temp1$allOK # 如果temp1$allOK的结果为TRUE，证明没有缺失值，可以直接下一步

## 分组信息表
group <- read.csv('group.csv',  check.names = F)
group$disease <- ifelse(group$group == 'disease', 1, 0)
group$control <- ifelse(group$group == 'control', 1, 0)
group <- dplyr::select(group, -group)
group <- plyr::rename(group, 
						c(Sample = 'id'))

group$id
datTraits <- data.frame(row.names = group$id, group = group[, 2:3])
traitColors <- numbers2colors(datTraits, signed = FALSE)
sampleTree = hclust(dist(dat, method = 'euclidean'), method = "complete") # 构建聚类树，看是否有离群样本
plotDendroAndColors(sampleTree, 
					traitColors,
					main = "Sample clustering to detect outliers", 
					sub="", 
					xlab="", 
					cex.lab = 2, # cex.lab指定轴标签的大小
					cex.axis = 1.5, # cex.axis指定刻度线标签的大小
					cex.main = 2, 
					groupLabels = colnames(datTraits),
					dendroLabels = FALSE
)

datExpr <- dat
nGenes = ncol(datExpr) # 列是基因名
nSamples = nrow(datExpr) # 行是样本名
## 计算软阈值
powers <- c(c(1:10), seq(from = 12, to=30, by=2))  # 设置网络构建参数选择范围，计算无尺度分布拓扑矩阵
sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)

par(mfrow = c(1, 2)); # 一张画布上显示两个图
cex1 = 0.9;
plot(sft$fitIndices[, 1], 
		-sign(sft$fitIndices[, 3])*sft$fitIndices[, 2],
		xlab = "Soft Threshold (power)",
		ylab = "Scale Free Topology Model Fit,signed R^2",
		type = "n",
		main = paste("Scale independence"));
text(sft$fitIndices[, 1], 
		-sign(sft$fitIndices[, 3])*sft$fitIndices[, 2],
		labels = powers,
		cex = cex1,
		col = "red");
abline(h=0.85,col="red") 
plot(sft$fitIndices[, 1], 
		sft$fitIndices[, 5],
		xlab = "Soft Threshold (power)",
		ylab = "Mean Connectivity", 
		type = "n",
		main = paste("Mean connectivity"))
text(sft$fitIndices[, 1], 
		sft$fitIndices[, 5], 
		labels = powers, 
		cex = cex1,
		col = "red")

net <- blockwiseModules(datExpr, 
						power = 22, # 7是刚才picksoftThreshold的软阈值
						maxBlockSize = nGenes, # 默认maxBlockSize是5000，表示仅会分析5000个基因
						TOMType = "unsigned", # 计算TOM矩阵时，是否考虑正负相关性；默认为"signed",可选"unsigned"
						minModuleSize = 100, # minModuleSize：模块中最少的基因数，设置为50，即模块中只有大于50个基因才会被统计
						reassignThreshold = 0, 
						# deepSplit = 2,
						mergeCutHeight = 0.3, # mergeCutHeight ：模块合并阈值，阈值越大，模块越少（重要）
						numericLabels = TRUE,  # 返回数字而不是颜色作为模块的名字，后面可以再转换为颜色
						pamRespectsDendro = FALSE,
						saveTOMs = TRUE, # 询问是否保存TOM文件
						saveTOMFileBase = "gene_TOM", # 保存的TOM文件命名
						corType = "pearson", # 计算相关性的方法；可选pearson(默认)，bicor。后者更能考虑离群点的影响
						loadTOMs = TRUE,
						verbose = 3)

table(net$colors) 
mergedColors <- labels2colors(net$colors) # 给每一个模块分别给个颜色（彩虹色）
sizeGrWindow(12, 9)
plotDendroAndColors(net$dendrograms[[1]], 
					mergedColors[net$blockGenes[[1]]],
					"Module colors",
					dendroLabels = FALSE, 
					hang = 0.03,
					addGuide = TRUE, 
					guideHang = 0.05)

moduleLabels <- net$colors # # 显示了数据集中的每个基因被分配到什么模块中，不同的模块用不同的数字表示
moduleColors <- labels2colors(net$colors) # label_to_colors，将每个模块对应的数字转成颜色
table(moduleLabels)
table(moduleColors)

## Recalculate MEs with color labels
MEs0 <- moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs <- orderMEs(MEs0)

datTraits <- plyr::rename(datTraits, 
							c(group.disease = 'disease', 
							group.control = 'control'))
## 计算基因集和模块之间的相关性
MEs <- dplyr::select(MEs, -MEgrey) ## 剔除灰色模块
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)

## Will display correlations and their p-values,
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
					signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) <- dim(moduleTraitCor)

## 绘图保存
par(mar = c(4, 9, 2, 2))
labeledHeatmap(Matrix = moduleTraitCor,
				xLabels = names(datTraits),
				yLabels = names(MEs),
				ySymbols = names(MEs),
				colorLabels = FALSE,
				colors = blueWhiteRed(50),
				textMatrix = textMatrix,
				setStdMargins = FALSE,
				cex.text = 0.7,
				zlim = c(-1,1),
				main = paste("Module-trait relationships"))

#计算MM,GS
module <- c('yellow')
probes <- colnames(datExpr) # 导入的fpkm文件中全部的基因名
inModule <- (moduleColors == module) # 模块是red中的基因
modProbes <- probes[inModule] # 筛选出关键基因
modGenes <- as.data.frame(modProbes)
colnames(modGenes) <- 'modgene'

## 模块与基因间的相关性
geneModuleMembership <- as.data.frame(cor(datExpr, MEs, use = "p"))
MMPvalue <- as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples))

geneTraitCor = as.data.frame(cor(datExpr, datTraits, use = "p"))
geneTraitP = as.data.frame(corPvalueStudent(as.matrix(geneTraitCor), nSamples))

modNames = substring(colnames(MEs), 3)

## 获取关注的列
pheno = "disease"
module_column = match(module, modNames)
pheno_column = match(pheno, colnames(datTraits))

## 获取模块内的基因
moduleGenes <- (moduleColors == module)

MM <- abs(geneModuleMembership[moduleGenes, module_column])
GS <- abs(geneTraitCor[moduleGenes, 1])
#提权关键基因
c <- as.data.frame(cbind(MM, GS))
rownames(c) <- modGenes$modgene

hub_gene <- subset(c, c$MM > 0.8 & c$GS > 0.6)
write.csv(hub_gene, file = paste0(module, '_screen_gene_GS_MM.csv'))

10 Screening of Core Targets by Machine Learning Algorithms

Employ multiple machine learning algorithms: LASSO regression, SVM-RFE, Boruta, XGBoost.Three core target genes, namely JAK2, IL2, and TGFBR2, were ultimately selected.

Click to expand ▼


library(tidyverse)
library(glmnet)
train_data <- read.csv("juzhen.csv", row.names = 1, check.names = F)
group <- read.csv("group.csv") # 为每个样本的分组信息（tumor和normal）
colnames(group) <- c('sample', 'group')
hub_gene <- data.frame(symbol = gene <- c('BCL2', 'IL2', 'JAK2', 'BDNF', 'TGFBR2', 'NOTCH2', 'TGFB2', 'MDM2'))
colnames(hub_gene) <- "symbol"

dat <- train_data[rownames(train_data) %in% hub_gene$symbol, ] %>%
	t() %>%
	as.data.frame() # 整理后行为样本名，列为基因名
dat$sample <- rownames(dat)
dat <- merge(dat, group, var = "sample") # 筛选后的表达矩阵与分组信息表合并
dat <- column_to_rownames(dat, var = "sample") %>% as.data.frame() # 列名转为行名

table(dat$group)
dat$group <- factor(dat$group, levels = c('disease', 'control'))
#LASSO
set.seed(381) # 设置种子
res.lasso <- cv.glmnet(x = as.matrix(dat[-ncol(dat)]), 
						y = dat$group, 
						family = "binomial", 
						type.measure = "default",
						nfolds = 10)
coef.min <- coef(res.lasso, s = "lambda.min")  ## lambda.min & lambda.1se 取一个

# 找出那些回归系数没有被惩罚为0的
active.min <- which(coef.min@i != 0)
# 提取基因名称
lasso_geneids <- coef.min@Dimnames[[1]][coef.min@i + 1]
lasso_geneids <- lasso_geneids[-1] %>% as.data.frame()
colnames(lasso_geneids) <- 'symbol'

write.csv(lasso_geneids, file = 'lasso_gene.csv')

#可视化
# lasso结果简单可视化
par(mfrow = c(1, 2))
plot(res.lasso$glmnet.fit, 
		xvar = 'lambda', 
		label = TRUE, 
		las = 1)
abline(v=log(c(res.lasso$lambda.min, res.lasso$lambda.1se)), lty="dashed")
plot(res.lasso, 
		las =1)

Click to expand ▼


library(tidyverse)
library(caret)
library(ggplot2)
library(cowplot)
library(ggplotify)
train_data <- read.csv("juzhen.csv", row.names = 1, check.names = F)
group <- read.csv("group.csv") # 为每个样本的分组信息（tumor和normal）
colnames(group) <- c('sample', 'group')
hub_gene <- data.frame(symbol = gene <- c('BCL2', 'IL2', 'JAK2', 'BDNF', 'TGFBR2', 'NOTCH2', 'TGFB2', 'MDM2'))
colnames(hub_gene) <- "symbol"

dat <- train_data[rownames(train_data) %in% hub_gene$symbol, ] %>%
	t() %>%
	as.data.frame() # 整理后行为样本名，列为基因名
dat$sample <- rownames(dat)
dat <- merge(dat, group, var = "sample") # 筛选后的表达矩阵与分组信息表合并
dat <- column_to_rownames(dat, var = "sample") %>% as.data.frame() # 列名转为行名

table(dat$group)
dat$group <- factor(dat$group, levels = c('disease', 'control'))

set.seed(21) # 设置种子
control <- rfeControl(functions = caretFuncs, method = "cv", number = 10) # cv 交叉验证次数10
# 执行SVM-RFE算法
num <- ncol(dat)-1
results <- rfe(x = dat[, 1:num], # 除去最后一列，其余列均为预测变量（也就是hubgene的表达量）
				y = dat$group, # 分组信息
				sizes = c(1:num), 
				rfeControl = control,
				method = "svmRadial"
)
## 结果分析
svmrfe_result <- data.frame(symbol = predictors(results)) ## 7个基因

write.csv(svmrfe_result, file = 'svm_rfe_gene.csv')
#可视化
# SVM-RFE结果简单可视化
p1 <- plot(results, type=c("o"),
			xgap.axis = 1)
p1 <- as.ggplot(plot_grid(p1))+
	labs(title="SVM_RFE_analyse", x="", y = "",size=25) +
	# theme_bw()+
	theme(plot.title = element_text(hjust =0.5,colour="black",face="bold",size=25),
		axis.text.x = element_blank(), 
		axis.text.y = element_blank(), 
		axis.title.x = element_blank(),
		axis.title.y = element_blank(),
		legend.text = element_blank(),
		legend.title = element_blank(),
		legend.position = "none",
		panel.grid.major = element_blank(),
		panel.grid.minor = element_blank())
p1

Click to expand ▼


library(tidyverse)
library(Boruta)
library(caret)
train_data <- read.csv("juzhen.csv", row.names = 1, check.names = F)
group <- read.csv("group.csv") # 为每个样本的分组信息（tumor和normal）
colnames(group) <- c('sample', 'group')
hub_gene <- data.frame(symbol = gene <- c('BCL2', 'IL2', 'JAK2', 'BDNF', 'TGFBR2', 'NOTCH2', 'TGFB2', 'MDM2'))
colnames(hub_gene) <- "symbol"

dat <- train_data[rownames(train_data) %in% hub_gene$symbol, ] %>%
	t() %>%
	as.data.frame() # 整理后行为样本名，列为基因名
dat$sample <- rownames(dat)
dat <- merge(dat, group, var = "sample") # 筛选后的表达矩阵与分组信息表合并
dat <- column_to_rownames(dat, var = "sample") %>% as.data.frame() # 列名转为行名

table(dat$group)
dat$group <- factor(dat$group, levels = c('disease', 'control'))

set.seed(123)
boruta.train <- Boruta(group~., data = dat, doTrace = 2, maxRuns = 500)
#修正
final.boruta <- TentativeRoughFix(boruta.train)
## 结果
boruta_gene <- data.frame(symbol = getSelectedAttributes(final.boruta, withTentative = T))
write.csv(boruta_gene, file = 'boruta_gene.csv')



#可视化
library(plotly)
library(dplyr)
library(RColorBrewer)

## User Configuration
# Extended color scheme with 11 colors (maintaining similar aesthetic)
colors <- c("#d36a87", "#ea9979", "#83b6b5", "#619cf5", "#a596ee",
			"#7fb069", "#f4a261", "#e76f51", "#2a9d8f", "#e9c46a", 
			"#264653")
fill_colors <- c("#e9b4c3", "#f4ccbc", "#c1dada", "#b0cdfa", "#d2caf6",
					"#bfd7b4", "#f9c9a0", "#f3b3a8", "#94d3c7", "#f4ddb4",
					"#93a6a9")

# Extract data from Boruta results
extract_boruta_data <- function(boruta_result) {
	# Extract importance history data
	imp_history <- boruta_result$ImpHistory
	
	# Remove infinite values and NAs
	clean_data <- lapply(1:ncol(imp_history), function(i) {
	imp_history[is.finite(imp_history[, i]), i]
	})
	names(clean_data) <- colnames(imp_history)
	
	# Sort by median values
	medians <- sapply(clean_data, median, na.rm = TRUE)
	sorted_names <- names(sort(medians))
	
	return(list(
	data = clean_data[sorted_names],
	feature_names = sorted_names
	))
}

# Create violin + box combination plot
create_boruta_viz <- function(boruta_result, title = "Boruta Feature Importance Analysis") {
	
	# Extract data
	boruta_data <- extract_boruta_data(boruta_result)
	clean_data <- boruta_data$data
	feature_names <- boruta_data$feature_names
	
	# Create plotly figure
	fig <- plot_ly()
	
	# Add violin and box layers for each feature
	for (i in seq_along(feature_names)) {
	feature_name <- feature_names[i]
	y_values <- clean_data[[feature_name]]
	
	# Select colors (cycling through available colors)
	color_idx <- ((i - 1) %% length(colors)) + 1
	main_color <- colors[color_idx]
	fill_color <- fill_colors[color_idx]
	
	# Add violin layer
	fig <- fig %>%
		add_trace(
		type = "violin",
		y = y_values,
		x = rep(feature_name, length(y_values)),
		name = feature_name,
		fillcolor = main_color,
		line = list(color = main_color),
		opacity = 0.255,
		side = "positive",
		width = 0.9,
		points = FALSE,
		showlegend = FALSE
		)
	
	# Add box layer
	fig <- fig %>%
		add_trace(
		type = "box",
		y = y_values,
		x = rep(feature_name, length(y_values)),
		name = feature_name,
		width = 0.2,
		fillcolor = fill_color,
		line = list(color = main_color, width = 4),
		marker = list(
			color = fill_color,
			size = 8,
			line = list(color = main_color, width = 2)
		),
		boxpoints = "all",
		jitter = 1,
		pointpos = -2.0,
		whiskerwidth = 0.4,
		boxmean = "sd",
		notched = TRUE,
		showlegend = FALSE
		)
	}
	
	# Set layout
	fig <- fig %>%
	layout(
		title = list(
		text = title,
		font = list(size = 16)
		),
		xaxis = list(
		title = "Feature Names",
		tickangle = -45,
		tickfont = list(size = 10)
		),
		yaxis = list(
		title = "Importance Score"
		),
		width = 1200,
		height = 750,
		showlegend = FALSE,
		plot_bgcolor = "rgba(0,0,0,0)",
		paper_bgcolor = "rgba(0,0,0,0)"
	)
	
	return(fig)
}

# Example usage (assuming you already have final.boruta object)
# fig <- create_boruta_viz(final.boruta, "Boruta Feature Selection Results")
# fig

# Simplified function call:
plot_boruta_styled <- function(boruta_result, title = "Boruta Feature Importance Analysis") {
	fig <- create_boruta_viz(boruta_result, title)
	return(fig)
}

fig <- plot_boruta_styled(final.boruta)
fig  # 直接输入变量名，图会显示在RStudio的Viewer窗格中

Click to expand ▼


library(tidyverse)
library(Matrix)
library(xgboost) 
library(ggplot2)
train_data <- read.csv("juzhen.csv", row.names = 1, check.names = F)
group <- read.csv("group.csv") # 为每个样本的分组信息（tumor和normal）
colnames(group) <- c('sample', 'group')
hub_gene <- data.frame(symbol = gene <- c('BCL2', 'IL2', 'JAK2', 'BDNF', 'TGFBR2', 'NOTCH2', 'TGFB2', 'MDM2'))
colnames(hub_gene) <- "symbol"

dat <- train_data[rownames(train_data) %in% hub_gene$symbol, ] %>%
t() %>%
as.data.frame() # 整理后行为样本名，列为基因名
dat$sample <- rownames(dat)
dat <- merge(dat, group, var = "sample") # 筛选后的表达矩阵与分组信息表合并
dat <- column_to_rownames(dat, var = "sample") %>% as.data.frame() # 列名转为行名

table(dat$group)
dat$group <- factor(dat$group, levels = c('disease', 'control'))

# 将trainset的1-8列（自变量）转换为矩阵
traindata1 <- data.matrix(dat[, -9])
# 利用Matrix函数，将sparse参数设置为TRUE，转化为稀疏矩阵
traindata2 <- Matrix(traindata1, sparse = T)
# 将因变量转换为numeric类型，-1是为了从0开始计数
train_y <- as.numeric(dat[, 9])-1
# 将自变量和因变量拼接为list
traindata <- list(data = traindata2, label = train_y)
dtrain <- xgb.DMatrix(data = traindata$data, label = traindata$label)
res.xgb <- xgboost(data = dtrain, max_depth = 5, eta=0.3, objective='binary:logistic', nround = 25)
xgb_importance <- xgb.importance(traindata2@Dimnames[[2]], model = res.xgb)    ##特征重要度
## 结果
hub_gene <- xgb_importance[c(1 : 5), ]
hub_gene$Feature <- gsub('.','-',hub_gene$Feature,fixed = T )
write.csv(hub_gene, '01.hub_gene.csv')
write.csv(xgb_importance, '02.xgb_importance.csv')
# xgboost结果简单可视化（ggplot2函数）
ggplot(xgb_importance, aes(x= reorder( Feature,Gain), y=Gain,fill=Feature)) +
geom_bar(stat="identity") +
theme_classic() +
guides(fill=FALSE)+
#theme(legend.position = )+
scale_fill_manual(values=c("#8DD3C7","#FFFFB3","#BEBADA","#FB8072","#80B1D3","#FDB462","#B3DE69","#FCCDE5","#D9D9D9","#BC80BD","#CCEBC5","#FFED6F","#377EB8","#100EB2","#FDB999"))+
coord_flip()+
theme_bw()+
ggtitle('XGBoost')+
theme(plot.title = element_text(size=24,color='black', face = "bold",family='Times'),
	axis.title.x =element_text(size=18,color='black', face = "bold",family='Times'),
	axis.text.x =element_text(size=16, color='black', face = "bold",family='Times'),
	axis.title.y =element_blank(),
	axis.text.y=element_text(size=16,   color='black',face = "bold",family='Times'),
	legend.title=element_text(size=20, color='black', face = "bold",family='Times'),
	legend.text=element_text(size=18, color='black', face = "bold",family='Times'),
	title=element_text(size=20, color='black', face = "bold",family='Times'),
	strip.text = element_text(size = 14,family = "Times", face = "bold"))+
theme(panel.grid.major=element_blank(),panel.grid.minor=element_blank())+
labs(x="gene",y="Gain",fill="")

11 Molecular Docking: Referring to the previous docking content, molecular docking of PET with the core target proteins of IBD

Molecular docking was conducted using Autodock to explore the interactions between PET and the four key genes, BCL2, JAK2, TGFBR2, and TGFB2. All binding energies were less than 0, highlighting the spontaneous and strong binding ability of IBD with these key proteins, emphasizing its high affinity and crucial role in the molecular mechanism influencing IBD. To further clarify the complex binding configurations, we utilized PyMOL for visualization, presenting the lowest energy binding conformations.

JAK2（-0.7549）

TGFBR2（-0.6397）

IL2（-0.5693）

12 Molecular Dynamics Simulation

Although the molecular docking analysis adopted a semi-flexible docking approach, it was unable to account for the flexibility of the protein structure, temperature, pressure, and solvent effects. Therefore, we conducted molecular dynamics (MD) simulations on the JAK2-PET complex to further explore the stability of protein-ligand interactions.

12.1 Preparation of Protein Topology

Download the CHARMM36 force field and the "cgenff_charmm2gmx.py" conversion script from the MacKerell laboratory website. Use pdb2gmx to write the protein topology:

gmx pdb2gmx -f protein.pdb -o protein_processed.gro -ter -ignh

12.2 Prepare Ligand Topology

Generate the PET topology using the CGENFF server and then execute the sort_mol2_bonds.pl script to construct the correct topology with matching coordinates.

perl sort_mol2_bonds.pl pet.mol2 pet_fix.mol2

python cgenff_charmm2gmx.py PET pet_fix.mol2 pet.str charmm36-jul2022.ff

Use editconf to convert this .pdb file to .gro format:

gmx editconf -f pet_ini.pdb -o pet.gro

12.3 Define the crystal and add the solvent.

gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0
	
gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro

12.4 Adding Ions

Run energy minimization using grompp and pass the .tpr file to genion:

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral

12.5 Energy Minimization

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr

gmx mdrun -v -deffnm em

12.6 Balance

To constrain the ligand, we need to generate a position restraint topology for it. First, create an index group for PET that only includes its non-hydrogen atoms:

gmx make_ndx -f pet.gro -o index_pet.ndx
...
	> 0 & ! a H*
	> q

Then, execute the genrestr module and select this newly created index group:

gmx genrestr -f pet.gro -n index_pet.ndx -o posre_pet.itp -fc 1000 1000 1000

Temperature coupling

Click to expand ▼


gmx make_ndx -f em.gro -o index.ndx
...
	0 System              : 33506 atoms
	1 Protein             :  2614 atoms
	2 Protein-H           :  1301 atoms
	3 C-alpha             :   163 atoms
	4 Backbone            :   489 atoms
	5 MainChain           :   651 atoms
	6 MainChain+Cb        :   803 atoms
	7 MainChain+H         :   813 atoms
	8 SideChain           :  1801 atoms
	9 SideChain-H         :   650 atoms
	10 Prot-Masses         :  2614 atoms
	11 non-Protein         : 30892 atoms
	12 Other               :    22 atoms
	13 pet                 :    22 atoms
	14 CL                  :     6 atoms
	15 Water               : 30864 atoms
	16 SOL                 : 30864 atoms
	17 non-Water           :  2642 atoms
	18 Ion                 :     6 atoms
	19 pet                 :    22 atoms
	20 CL                  :     6 atoms
	21 Water_and_ions      : 30870 atoms
	> 1 | 13
> q

All the XVG files below were visualized using the DulvyTools script. Continue with NVT equilibration using the .mdp file:


gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt
输入：16 0，来选择温度  
dit xvg_show -f temperature.xvg

Continue with the NPT:


gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt
输入：18 0，来选择压力  
dit xvg_show -f pressure.xvg

12.7 MD Simulation

After completing two equilibration stages, the system is now well equilibrated at the desired temperature and pressure. Next, we conducted a 10-ns MD simulation:


gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_10.tpr、、
gmx mdrun -deffnm md_0_10 -nb -gpu

12.8 Analysis

Use the gmx_trjconv command to center the entire system in the box and visualize it through VMD: gmx trjconv -s md_0_10.tpr -f md_0_10.xtc -o md_0_10_noPBC.xtc -pbc mol -center Select protein, 1; 1 gmx trjconv -s md_0_10.tpr -f md_0_10_noPBC.xtc -o center.pdb -dt 1000 Select protein, 1

Calculate RMSD, RMSF, and gyration radius.


gmx rms -s md_0_10.tpr -f fit.xtc -o rmsd.xvg -tu ns  
选择蛋白骨架，4，随后选择蛋白，1  
dit xvg_show -f rmsd.xvg


计算RMSF  
gmx rmsf -s md_0_10.tpr -f fit.xtc -o rmsf.xvg  
dit xvg_show -f rmsf.xvg


gmx gyrate -s md_0_10.tpr -f fit.xtc -o gyrate.xvg  
选择蛋白，1  
dit xvg_show -f gyrate.xvg

Free Energy Landscape Mapping via PCA

Covariance calculation:


协方差计算（这里不太懂）：  
gmx covar -s md_0_10tpr -f fit.xtc -o eigenvalues.xvg -v eigenvectors.trr -xpma covapic.xpm  
python xpm2png.py -f covapic.xpm -o covapic.png -ip yes

Next, perform PCA analysis, draw the morphology diagram and visualize it:

Click to expand ▼


##生产主成分数据  
gmx anaeig -s md_0_10.tpr -f fit.xtc -v eigenvectors.trr -first 1 -last 1 -proj pc1.xvg  
gmx anaeig -s md_0_10.tpr -f fit.xtc -v eigenvectors.trr -first 2 -last 2 -proj pc2.xvg
gmx sham -tsham 310 -nlevels 100 -f pc_output.xvg -ls pca12_gibbs.xpm -g pca12_gibbs.log -lsh pca12_enthalpy.xpm -lss pca12_entropy.xpm  
python xpm2png.py -f pca12_gibbs.xpm -o pca12_gibbs.png -ip yes

13 Construction of the Adverse Outcome Pathway (AOP) System

Finally, by constructing a PET-inflammatory bowel disease adverse outcome pathway (AOP) framework, we linked PET microplastic exposure to the pathogenesis of IBD and identified three potential molecular initiating events (MIEs). Our research results indicate that JAK2, IL2, and TGFBR2 are closely related to IBD and play a key role in the pathogenesis of IBD by regulating the PI3K-Akt signaling pathway, JAK-STAT signaling transduction pathway, and TGF-β signaling cascade. ![[exported_image (2) 1.png]] Specifically, PET microplastics affect the progression of IBD through the following mechanisms:

1. Oxidative stress pathway: Activated through the AGE-RAGE signaling pathway, it leads to the massive release of pro-inflammatory factors (TNF-α, IL-6) mediated by NF-κB.

2. Immune dysregulation mechanism: Activation of C-type lectin receptor signaling promotes the activation of the NLRP3 inflammasome, leading to the secretion of inflammatory factors such as IL-1β/IL-18.

3. Uncontrolled cell proliferation: Through the synergistic action of the PI3K-Akt/Ras pathway, it inhibits apoptosis (upregulation of BCL2) and drives abnormal proliferation (upregulation of Cyclin D1).

4. Intestinal Barrier Disruption: Interfering with ion channel functions (CLCN2, KCN2), it disrupts the integrity of the intestinal epithelium and exacerbates intestinal inflammatory responses.

Reference

1. Xie, L., Zhao, Y., Pan, Y., Shang, Y., Kong, Y., Gu, Y., Chen, T., & Mao, W. (2025). The impact of polyethylene terephthalate microplastics on the pathogenesis of atherosclerosis: Focusing on network toxicology and target gene detection. Ecotoxicology and Environmental Safety, 302, 118692. https://doi.org/10.1016/j.ecoenv.2025.118692

2. Zhao, W., Yang, S., Hu, S., Feng, Y., & Bai, B. (2025). Polyethylene terephthalate microplastics promote pulmonary fibrosis via AKT1, PIK3CD, and PIM1: A network toxicology and multi-omics analysis. Ecotoxicology and Environmental Safety, 303, 118954. https://doi.org/10.1016/j.ecoenv.2025.118954

3. Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C. jvenn: an interactive Venn diagram viewer. BMC Bioinformatics. 2014;15:293. doi:10.1186/1471-2105-15-293

4. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Research. 2023;51(D1):D638–D646.

5. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13(11):2498–2504. doi:10.1101/gr.1239303

6. Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 3.0. Schrödinger, LLC.

7. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry. 2010;31(2):455–461. doi:10.1002/jcc.21334

8. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. doi:10.1016/j.softx.2015.06.001

9. Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation. 2008;4(3):435–447. doi:10.1021/ct700301q

10. The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research. 2023;51(D1):D523–D531. doi:10.1093/nar/gkac1052

11. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discovery. 2012;2(5):401–404. doi:10.1158/2159-8290.CD-12-0095

12. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science Signaling. 2013;6(269):pl1. doi:10.1126/scisignal.2004088

Part6

Identification of the SIN3-binding motif in MXI1

Construction of a PPI Network

Proof of Enrichment Regions of the Complex

Correlation analysis

Molecular docking

1. Identification of the SIN3-binding motif in MXI1

(1) he complete amino acid sequence (228 amino acids) of the human MXl1 protein was extracted from the UniProt-provided FASTA sequence.

(2) Motif scanning analysis

The SIN3-binding signature was Scanned using the regular expression pattern: [LIVFMWY][^P][^P][LIVFMWY][LIVFMWY].

Two potential SIN3-binding motifs were identified in the N-terminal region:

Motif 1: Position 9-13, sequence "MINVQ"

Motif 2: Position 90-94, sequence "LKVLI"

2. Construction of a PPI Network

To systematically visualize and validate the interactions among core components of the transcriptional complex, a protein-protein interaction (PPI) network was constructed.

This network comprises five key proteins: MXI1, SIN3A, SIN3B, HDAC1, and HDAC2. Each node represents a protein, and its unique color and internal structure diagram facilitate visual distinction. The numerous connection edges (lines) between all nodes reveal a dense and highly interconnected network structure, indicating strong and multi-faceted interactions within the complex. Notably, the MXI1 node (red) has direct connections with both SIN3A (yellow) and SIN3B (green), providing computational evidence for our initial hypothesis and ChIP-seq findings: MXI1 acts as a key recruiter by directly binding to the scaffold proteins SIN3A and SIN3B. Additionally, Sin3 proteins exhibit direct interactions with histone deacetylases HDAC1 (light blue) and HDAC2 (blue), which is consistent with their known role in forming the core of the repressive complex. The presence of HDAC1 and HDAC2 in this network directly links our complex to the enzymatic activities responsible for chromatin condensation and transcriptional silencing.

3. Proof of Enrichment Regions of the Complex

Genome-wide mapping of the MXI1-Sin3a/b-HDAC1/2 complex revealed significant enrichment on gene regulatory elements. Annotation of the co-binding sites indicated that 92.3% of the co-binding peaks were located within promoter regions, with the vast majority (92.3%) within 1 kb upstream of the transcription start site (TSS). This significant promoter preference is highly consistent with the established role of Sin3-HDAC complexes as direct regulators of transcription initiation.

To investigate the functional state of chromatin at these binding sites, we evaluated the enrichment of the active enhancer mark histone H3 lysine 27 acetylation (H3K27ac). Notably, we found that the H3K27ac signal at MXI1-Sin3B-HDAC1 co-binding sites was 29.6 times higher than that at randomly selected genomic regions.

This difference was statistically highly significant (Wilcoxon rank sum test, p < 0.05). This finding demonstrates that a complex known for its repressive function is associated with an active chromatin mark - suggesting a complex model of gene regulation. It strongly implies that the MXI1-Sin3B-HDAC1 complex is not recruited to silent loci but is targeted to already active promoters to fine-tune or repress transcriptional output.

4. Correlation analysis

To validate the practical relevance of our molecular findings in a clinical context, we utilized cBioPortal to analyze the co-expression patterns of the components of the MXI1-Sin3-HDAC complex in a large number of human cancer samples from the TCGA database. Our analysis revealed a consistent and statistically significant positive correlation between MXI1 and its key partners. Specifically, the mRNA expression of MXI1 showed a strong positive correlation with SIN3A (Spearman's ρ = 0.39, p < 1.63e-16), and moderate positive correlations with HDAC1 (ρ=0.25, p=2.70e-7) and HDAC2 (ρ=0.26, p=8.46e-8).

Molecular docking

In the protein screening, we conducted molecular docking between the expected HFBI protein to be screened and AGA1. The results of the docking provided support for the next step of the experiment. The specific results are as follows: