"Without theory, facts are but blind."

— Immanuel Kant

The description of the system using ordinary differential equations.

Our Lacbutler is designed to reside long-term and stably within the colon or small intestine of individuals with lactose intolerance, aiding in the breakdown of excess lactose to address this condition. To achieve this, we have implemented three core systems: a lactose metabolism system, an adhesion system, and a population density maintenance system. Detailed information on each system can be found in the description or on their respective dedicated pages. Our current objective is to establish a mathematical description of the operational processes of these systems within the engineered bacteria, to better enable Lacbutler to achieve the desired therapeutic outcomes.

Lactose Metabolism System

In this section, we aim to model the diffusion of lactose within the intestine, the transport of lactose by LacY, and the metabolic process mediated by lactase, with the goal of identifying key parameters that limit our metabolic efficiency.

Prior to the analysis, and to facilitate discussion by avoiding certain extreme special cases, we make the following assumptions:

1、The micro-environment at the bacterial cell surface is spatially discontinuous with the bulk intestinal lumen.

2、Each micro-environment is well-mixed, allowing us to neglect spatial gradients within it.

3、We assume the transport rate of LacY is linearly dependent on the proton gradient during the transport process.

4、For lactase, we assume that the adjustment of enzymatic activity in response to changes in substrate concentration is instantaneous, and that a basal degradation pathway for intracellular lactose exists.

5、To better analyze the key parameters of the engineered bacteria's metabolism, we employ a mean-field approximation to ignore lactose uptake rate variations caused by population effects. That is, the population behavior is described by an average density N, and the change in micro-environment concentration is proportional to both the population density and the per-capita uptake rate.

Rate of change of lactose concentration in the bacterial surface micro-environment

Description:

This equation quantifies the instantaneous rate of change of lactose concentration in the micro-environment at the bacterial cell surface. The change is governed by two primary terms:

The first term, adhering to Fick's First Law, represents the net influx of lactose from the bulk intestinal lumen into the micro-environment via passive diffusion.

The second term represents the total consumption rate of lactose by the entire bacterial population, driven by both basal transport and proton motive force-driven symport.

Rate of Change of Intracellular Lactose Concentration

Description:

This equation defines the dynamic balance of lactose concentration within the cytoplasm. The net accumulation rate equals the transmembrane uptake rate minus the intracellular consumption rate. The uptake rate is the sum of basal transport and active transport mediated by the LacY protein, which is dependent on the proton gradient. The consumption rate includes the hydrolysis reaction catalyzed by lactase BgaB (following Michaelis-Menten kinetics) and degradation caused by non-enzymatic or bypass metabolic pathways.

Rate of change of the transmembrane proton gradient

Description:

This equation describes the dynamic balance of the transmembrane proton gradient that drives secondary active transport. Its rate of change is governed by three key processes: the first term represents the gradient generation rate driven by energy derived from lactose catabolism, which is proportional to the hydrolytic flux and drives proton pumping; the second term represents gradient decay due to passive processes such as membrane proton leakage; the third term represents the gradient consumption rate caused by the LacY protein consuming the proton motive force during lactose-H⁺ symport activity.

Rate of change of bacterial population density

Description:

This equation describes the exponential growth kinetics of the bacterial population driven by energy derived from lactose metabolism. The specific growth rate of the population is proportional to the hydrolytic flux of BgaB, with the proportionality constant being the macro yield coefficient that converts substrate flux into biomass synthesis.

variable	symbol	unit	Biological Significance
L_surf	Surface lactose concentration	mM	Lactose concentration in the microenvironment on the bacterial surface
L_in	Internal lactose concentration	mM	Lactose concentration within the bacterial cells
Ψ	proton gradient	dimensionless	Relative Intensity of the Transmembrane Proton Gradient (0-1)
N	Bacterial Density	OD₆₀₀	Bacterial Population Density

parameter	symbol	unit	Biological Significance
k_diff	Diffusion rate constant	min⁻¹	The diffusion rate of lactose from the intestinal lumen to the microenvironment
L_bulk	Intestinal lactose concentration	mM	Lactose concentration in the body of the intestine
r	Microenvironmental volume factor	(OD₆₀₀·mM/min)⁻¹	The coefficient that converts the uptake of the bacterial body into a change in concentration
V_other	Basal transport rate	mM/min	Transport rates that do not depend on the proton gradient
k_H	P proton coupling transport constant	mM/min	Transportation rate constants associated with proton gradients
K_t	Transportation of the Mie constant	mM	The semi-saturation concentration of lactose for LacY
V_e	Maximum enzyme rate	mM/min	Maximum catalytic rate of lactase
K_e	Enzyme Miller constant	mM	The semi-saturation concentration of lactase to lactose
d_l	Lactose degradation constant	min⁻¹	Non-enzymatic degradation rate of endogenous lactose
k_resp	Breathing rate constant	min⁻¹	The efficiency with which the metabolic proton gradient is produced
k_leak	Neutron leakage constant	min⁻¹	The base leakage rate of the proton gradient
β	Transportation coefficient	dimensionless	The proportion of the proton gradient consumed by LacY transport
k_μ	Growth yield coefficient	OD₆₀₀/mM	The efficiency of lactose breakdown into growth

Adhesion System

During our efforts to model this process, we noted that the specific degradation pathways of the BreR transcription factor and the molecular routes for the uptake and metabolism of deoxycholic acid by bifidobacteria are not yet fully elucidated. Furthermore, constrained by experimental timelines and necessary approval procedures, we are currently unable to investigate these issues in depth. However, to proceed with an initial attempt at this modeling approach, we hereby make the following assumption for this section: the intracellular concentration of the BreR transcription factor, expressed under the control of the strong promoter pgap, is constant, assuming no other influencing factors beyond the deoxycholic acid concentration.

Rate of Change of BreR-Deoxycholic Acid Complex Concentration

For BreR (R) and deoxycholic acid (D), we have:

R + D → RD

Description:

This equation describes the kinetic process by which the BreR transcription factor binds to deoxycholic acid to form a complex. The association rate is proportional to the concentrations of free BreR [R] and deoxycholic acid [D], while the dissociation rate is proportional to the concentration of the complex [RD]. The model also accounts for the degradation of the RD complex.

Rate of change of MucBP adhesion protein concentration

Description:

This equation refines the expression kinetics of the MucBP protein. It incorporates the basal inhibition of the MucBP gene by free BreR [R] using an inhibitory Hill function, and the activation of the MucBP gene by the BreR-deoxycholic acid complex [RD] using an activatory Hill function. The parameter α represents the maximum induction multiple of DCA.

Growth equation for the correction of mycelium colonization efficiency

Description:

Considering the role of colonization capacity in intestinal microbiota growth and its contribution to lactose intolerance resolution in Lacbutler, we adjusted the population density changes of Bifidobacterium based on the adhesion system: We assumed that f(M) is a monotonically increasing function with a range of [f_min, 1], which describes the contribution of adhesion capacity to the actual growth rate.

among f_min: is the minimum colonization efficiency. When M = 0, i.e., there are no adhesion proteins present, f(M) =f_min. This means that the bacterial cells can only be present in a very low proportion (f_min) Grow by using the energy obtained from their metabolism.

K_M: Is the half-maximum efficiency constant. It reflects the sensitivity of MucBP protein concentration to colonization efficiency. When M = K_MAt this time, the difference between the maximum and minimum values of the planting efficiency f (M) is half. The original metabolic growth term $K_{H} \frac{V_{L,m}}{K_{c}+L_{m}}$ represents the potential growth capacity of the bacteria under ideal colonization conditions. After multiplying by f(M), the actual growth capacity is obtained.

variable	symbol	unit	Biological Significance
M	MucBP potency	µM	The concentration of adhesion protein in the bacterial body determines the colonization ability
R	dissociate BreR	µM	Concentration of transcription factors not bound to deoxycholic acid
RD	BrReR-D complex	µM	Concentration of active complexes between BRRe and deoxycholic acid

parameter	symbol	unit	Biological Significance
k_on	association rate constant	µM^-1·min^-1	The binding rate of BrRe with deoxycholic acid
k_off	dissociation rate constant	min^-1	Decomposition rate of BrRe-DA complex
d_RD	Composite degradation constant	min^-1	Degradation rate of BrRe-DOC complexes
k_M	Maximum synthesis rate	µM·min^-1	Maximum expression rate of MucBP
K_D	inhibition constant	µM	BrReR half-inhibitory concentration of promoter
K_RD	Activation constant	µM	Half-activation concentration of RD complex
n, m	Hill coefficient	dimensionless	Coherence in regulation
α	Maximum activation factor	dimensionless	RD induced the strongest increase in expression
f_min	Minimum planting efficiency	dimensionless	Relative growth rate without adhesion
K_M	Constant of semi-efficiency of planting	µM	MucBP to the half maximum concentration of colonization

Population density control system

As before, we assume that:

1、Considering that the corresponding mechanism of AI-2 has only been demonstrated to exist in Bifidobacterium, specific transport and decomposition mechanisms require further investigation. Additionally, we employed a strong promoter to activate it, thereby minimizing interference from other bacterial groups adopting the AI-2 strategy. This approach allows us to describe related processes using a constant parameter.

2、Considering that the promoter plsrA driving ccdB expression has been reported to have a threshold property for AI-2, we used a step function to approximate the description of its process

Internal AI-2 concentration change rate

Description:

This equation quantifies the dynamic equilibrium of intracellular AI-2 signaling molecules. The rate of change is determined by four processes: sustained synthesis, where the LuxS enzyme maintains a constant rate kluxGeneration of AI-2, active uptake, i.e. the Mis kinetics describes the uptake of AI-2 from the environment, and active discharge, i.e. the first-order kinetics rate kexportTransport to the extracellular and intracellular degradation is given by the rate constant dinphysiolysis.

External AI-2 Kinetics Equation

Description:

The equation describes the population accumulation dynamics of AI-2 signals in the environment. The first term represents the total amount of AI-2 discharged by all bacterial cells, establishing a quantitative relationship between population density and environmental signals. The contribution of individual bacteria is proportional to their intracellular AI-2 concentration and discharge rate.

The second item describes the reduction of environmental signals, which includes two mechanisms: physical dilution, caused by intestinal peristalsis and fluid flow, and chemical dilution, caused by the natural decomposition rate of AI-2 molecules in the environment.

CcdB toxicity protein kinetics equation

Description:

The equation describes the expression regulation of the CcdB toxic protein in a suicide system. It employs a continuous threshold response model, using a Sigmoid function to simulate the switching characteristics of the plsrA promoter. The maximum expression rate is denoted by k_c, which characterizes the synthesis intensity of CcdB. Additionally, the degradation of the CcdB protein is described by the term d_c⋅C, representing the first-order kinetic rate of degradation.

variable	symbol	unit	Biological Significance
A_in	Internal AI-2 concentration	µM	Concentration of AI-2 signaling molecules in bacteria
A_out	External AI-2 concentration	µM	Concentration of AI-2 signaling molecules in the environment
C	CcdB potency	µM	Intracellular concentration of toxic protein CcdB
N	Bacterial density	OD₆₀₀	Bacterial population density

parameter	symbol	unit	Biological Significance
k_lux	AI-2 synthesis rate	µM·min⁻¹	The rate constant for LuxS to express AI-2
k_export	AI-2 discharge rate	min⁻¹	Rate constant of intracellular efflux of AI-2
d_in	Internal degradation constant	min⁻¹	Decay rate constant of AI-2 in cells
d_out	Total out rate constant	min⁻¹	Environmental AI-2 degradation/dilution rate constant
k_C	CcdB expression rate	µM·min⁻¹	Expression rate of CcdB when plsrA is activated
A_th	AI-2 threshold concentration	µM	Minimum AI-2 concentration required for plsrA promoter activation
d_C	CcdB degradation constant	min⁻¹	Decay rate constant of ccdh protein
δ	Death rate constant	min⁻¹·µM⁻¹	The rate of death caused by CcdB
κ	Switch steepness parameter	µM⁻¹	Control the steepness of the transition function response
d_flow	Physical dilution rate constant	min⁻¹	Represents the dilution rate caused by intestinal peristalsis and content renewal
d_deg	Chemical degradation rate constant	min⁻¹	Represents the spontaneous or enzymatic decomposition rate of the AI-2 molecule in the environment
V_imp	Maximum internalization rate	µM·min⁻¹	Indicates the maximum rate at which a single bacterial body absorbs AI-2 when the concentration of AI-2 is saturated in the environment
K_imp	Internalization half saturation constant	µM	Indicates the concentration of environmental AI-2 required to reach half of the maximum internalization rate

The final system of equations.

And that gives us the final description of population density:

By considering the lactose content in the gut and assuming that the increase of lactose mainly comes from diet and other consumption items are described as a first-order kinetic function related to lactose concentration, we can derive the differential equation about the lactose content in the gut

The lactose input term utilizes methods from digestive dynamics, employing an exponential decay model for the initial maximum rate to simulate the dynamic process of lactose entering the intestinal tract from external sources. The engineered bacterial uptake term quantifies the total lactose intake rate by Bifidobacterium populations, while the natural consumption term represents lactose degradation caused by non-bacterial factors. Compare this formula with

as well as

By combining, the theoretical description of lactose content changes in the intestine can be described.

                        Implementation Code
                        Python
                    

                        import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.integrate import solve_ivp
from scipy.signal import find_peaks


def bacterial_oscillation_simulation():
    # Parameters for lactose metabolism
    L_bulk = 50.0
    k_diff = 0.2
    r = 0.02
    V_other = 0.02
    k_H = 0.7
    K_t = 2.0
    V_e = 1.2
    K_e = 1.5
    d_l = 0.03
    k_resp = 0.3
    k_leak = 0.15
    beta = 0.2
    
    # Quorum sensing parameters
    k_lux = 0.15
    k_export = 0.7
    V_imp = 0.3
    K_imp = 3.0
    d_in = 0.1
    d_flow = 0.2
    d_deg = 0.03
    k_C = 1.0
    A_th = 2.0
    kappa = 25
    d_C = 0.2
    
    # Population dynamics parameters
    k_mu = 0.2
    f_min = 0.1
    K_M = 5.0
    delta = 2.0
    d_N = 0.02
    
    # Adhesion system parameters
    D = 2.0
    k_M_muc = 0.5
    d_M = 0.1
    
    # Gut lactose parameters
    S_0 = 1.5
    k_s = 0.004
    d_gut = 0.04
    
    # Time range
    t_span = (0, 1200)
    t_eval = np.linspace(0, 1200, 3000)
    
    # Initial conditions
    y0 = [L_bulk, 0.01, 0.5, 0.001, 0.001, 0.001, 0.1, 0.05, L_bulk]
    
    # ODE system
    def ode_system(t, y):
        L_surf, L_in, Psi, A_in, A_out, C, M, N, L_gut = y
        
        # Surface lactose dynamics
        T_lac = V_other + k_H * Psi * L_surf / (K_t + L_surf)
        dL_surf_dt = k_diff * (L_bulk - L_surf) - r * N * T_lac
        
        # Internal lactose dynamics
        E_lac = V_e * L_in / (K_e + L_in)
        dL_in_dt = T_lac - E_lac - d_l * L_in
        
        # Proton gradient dynamics
        dPsi_dt = k_resp * E_lac - k_leak * Psi - beta * N * (k_H * Psi * L_surf / (K_t + L_surf))
        
        # Internal AI-2 dynamics
        dA_in_dt = k_lux + (V_imp * A_out / (K_imp + A_out)) - k_export * A_in - d_in * A_in
        
        # External AI-2 dynamics
        dA_out_dt = N * k_export * A_in - (d_flow + d_deg) * A_out
        
        # ccdB toxicity protein dynamics
        Theta = 1 / (1 + np.exp(-kappa * (A_in - A_th)))
        dC_dt = k_C * Theta - d_C * C
        
        # MucBP dynamics
        dM_dt = k_M_muc * (0.2 + 0.8 * D / (1 + D)) - d_M * M
        
        # Population density dynamics
        f_M = f_min + (1 - f_min) * M / (K_M + M)
        mu_eff = k_mu * E_lac * f_M
        delta_total = delta * C + d_N
        dN_dt = (mu_eff - delta_total) * N
        
        # Gut lactose dynamics
        V_input = S_0 * np.exp(-k_s * t)
        Bacterial_uptake = N * T_lac
        dL_gut_dt = V_input - Bacterial_uptake - d_gut * L_gut
        
        return [dL_surf_dt, dL_in_dt, dPsi_dt, dA_in_dt, dA_out_dt, dC_dt, dM_dt, dN_dt, dL_gut_dt]
    
    # Solve ODE
    sol = solve_ivp(ode_system, t_span, y0, t_eval=t_eval, method='RK45')
    
    # Extract results
    t = sol.t
    N = sol.y[7]  # Population density
    L_gut = sol.y[8]  # Gut lactose content
    
    # Create figure with two subplots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
    
    # Population density
    ax1.plot(t, N, 'b-', linewidth=2)
    ax1.set_title('Bacterial Population Density')
    ax1.set_xlabel('Time (min)')
    ax1.set_ylabel('OD')
    ax1.grid(True)
    
    # Gut lactose content
    ax2.plot(t, L_gut, 'g-', linewidth=2)
    ax2.set_title('Gut Lactose Content')
    ax2.set_xlabel('Time (min)')
    ax2.set_ylabel('Concentration (mM)')
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    return {
        'time': t,
        'N': N,
        'L_gut': L_gut
    }

# Run simulation
if __name__ == "__main__":
    results = bacterial_oscillation_simulation()
df.to_csv('bacterial_simulation_results.csv', index=False)
print("结果已保存到 bacterial_simulation_results.csv")
                    

Given the current limitations in understanding critical components planned for our application, many parameters can only be estimated through empirical methods. However, all parameters can be validated through experimental or in vivo testing. Therefore, we consider these findings as references for anticipated outcomes in wet-lab environments and as attempts to mathematically model such components. The methodological value of this approach should far outweigh its practical data interpretation.

It is evident that lactase BgaB plays a crucial role in lactose decomposition. Given the significant industrial and domestic importance of lactase, we plan to conduct directed evolution simulations targeting BgaB to identify lactase variants with enhanced enzymatic activity.

Attempts to engineer lactase.

While exploring this field of continuous learning and experimentation, we recognized the substantial technical barriers and learning curve involved. To empower future researchers and learners, we aim to share practical insights that help more people acquire essential synthetic biology skills. Drawing inspiration from the guidance provided by numerous teams in our journey, we now offer demonstrative instructions and code examples to clarify fundamental principles.

Exploration of the mechanism of lactase BgaB

Access to molecular structures

Clearly, we need to first understand the mechanisms of action between BgaB and lactose. Therefore, we plan to use molecular dynamics simulations to elucidate their specific interaction mechanisms. The primary task involves determining the three-dimensional structures of both the substrate ——lactose and the BgaB protein.

We obtained the molecular structure information of lactose (PubChem CID: 5913) from PubChem, a widely used chemical information database, and provided the molecular structure data. The data was converted into PDB format using Avogadro to facilitate future simulations. Additionally, for better visualization, we recommend using PyMOL, a molecular visualization tool, to examine the obtained molecular structures.

As for lactase BgaB, we did not find its structural data in the existing database. Therefore, we found the FASTA sequence of the lactase we planned to use on NCBI and used AlphaFold3, which is one of the most accurate protein structure prediction algorithms at present, to generate its 3D structure.

It can be noted that the structure we generated has ipTM = -pTM = 0.97, so we believe that it has high credibility.

The preparation of the molecular dynamics system.

One way we need to understand how BgaB interacts with lactose is through molecular dynamics simulations, and we will do that next.

When performing molecular dynamics simulations, a force field parameter file is essential to define atomic types and various interactions. We selected CHARMM36, available from the MacKerell Labs website, as a widely-used force field for modeling protein-small molecule interactions that has been rigorously tested through numerous experiments. Alternatively, other force fields provided by CGenFF could be considered. After preparing the force field parameter file, we utilized

gmx pdb2gmx -f input_file.pdb -o output_file.gro -ter

To convert the BgaB structure into a GROMACS - readable .gro file, you can select the force - field parameters and water model during the process.

When generating CHARMM36 parameters for BgaB, we encountered issues. The BgaB structure from AlphaFold3 had significant differences in atomic type definitions compared to CHARMM36 parameters. Inconsistent atomic type definitions are unacceptable in simulations.

After discussion, we decided to use CHARMM36 force - field parameters uniformly. We manually modified the atomic definitions in the BgaB’s pdb file and created the force - field parameter file. After verification, it was used in subsequent steps.

For small molecules like lactose, file format conversion is also required to enable GROMACS calculations. Since the CHARMM36 force field cannot recognize lactose molecules, we need a method to generate parameters for lactose within this framework. Here, we opt to use CGenFF for parameter generation (incidentally, quantum mechanical methods could also be employed to derive consistent solutions for original force field parameters of this small molecule, though this might require some computational effort). Therefore, obtaining .mol2 files compatible with CGenFF becomes essential.

Since the CHARMM force field uses a full-atom approach, hydrogen atoms are explicitly represented. However, our lactose crystal structure lacks hydrogen coordinates, requiring the use of Avogadro software to add hydrogen atoms and generate .mol2 files. We observed that some programs fail to list bond indices in ascending order when creating .mol2 files, which may lead to issues when constructing accurate topology files with matching coordinates. We recommend using Justins sort_mol2_bonds.pl script to resolve this issue. Consider the following:

perl sort_mol2_bonds.pl input.mol2 output.mol2

                        sort_mol2_bonds.pl
                        Perl
                    

                        #!/usr/bin/perl

use strict;

# sort_mol2_bonds.pl - a script to reorder the listing in a .mol2 @BOND
# section so that the following conventions are preserved:
#   1. Atoms on each line are in increasing order (e.g. 1 2 not 2 1)
#   2. The bonds appear in order of ascending atom number
#   3. For bonds involving the same atom in the first position, the bonds appear
#       in order of ascending second atom
#
# Written by: Justin Lemkul (jalemkul@vt.edu)
#
# Distributed under the GPL-3.0 license

unless (scalar(@ARGV)==2)
{
    die "Usage: perl sort_mol2_bonds.pl input.mol2 output.mol2\n";
}

my $input = $ARGV[0];
my $output = $ARGV[1];

open(IN, "<$input") || die "Cannot open $input: $!\n";
my @in = ;
close(IN);

# test for header lines that some scripts produce
unless($in[0] =~ /TRIPOS/)
{
    die "Nonstandard header found: $in[0]. Please delete header lines until the TRIPOS molecule definition.\n"; 
}

open(OUT, ">$output") || die "Cannot open $output: $!\n";

# get number of atoms and number of bonds from mol2 file
my @tmp = split(" ", $in[2]);
my $natom = $tmp[0];
my $nbond = $tmp[1];

# check
print "Found $natom atoms in the molecule, with $nbond bonds.\n";

# print out everything up until the bond section
my $i=0;
while (!($in[$i] =~ /BOND/))
{
    print OUT $in[$i];
    $i++;
}

# print the bond section header line to output
print OUT $in[$i];
$i++;

# read in the bonds and sort them
my $bondfmt = "%6d%6d%6d%5s\n";
my @tmparray; 

# sort the bonds - e.g. the one that has the
# lowest atom number in the first position and then the
# lowest atom number in the second position (swap if necessary)
for (my $j=0; $j<$nbond; $j++)
{
    my @tmp = split(" ", $in[$i+$j]);
    # parse atom numbers
    my $ai = $tmp[1];
    my $aj = $tmp[2];
    # reorder if second atom number < first
    if ($aj < $ai)
    {
        $ai = $tmp[2];
        $aj = $tmp[1];
    }
    # store new lines in a temporary array
    $tmparray[$j] = sprintf($bondfmt, $tmp[0], $ai, $aj, $tmp[3]); 
}

# loop over tmparray to find each atom number
my $nbond = 0;
for (my $x=1; $x<=$natom; $x++)
{
    my @bondarray;
    my $ntmp = scalar(@tmparray);
    for (my $b=0; $b<$ntmp; $b++)
    {
        my @tmp = split(" ", $tmparray[$b]);
        if ($tmp[1] == $x)
        {
            push(@bondarray, $tmparray[$b]);
            splice(@tmparray, $b, 1);
            $ntmp--;
            $b--;
        }
    }

    if (scalar(@bondarray) > 0) # some atoms will only appear in $aj, not $ai
    {
        my $nbondarray = scalar(@bondarray);
        if ($nbondarray > 1)
        {
            # loop over all bonds, find the one with lowest $aj
            # and then print it
            for (my $y=0; $y<$nbondarray; $y++)
            {
                my @tmp2 = split(" ", $bondarray[$y]);
                my $tmpatom = $tmp[2];
                my $lowindex = 0;
                if ($tmp2[2] < $tmpatom)
                {
                    $lowindex = $y; 
                }
                my $keep = splice(@bondarray, $lowindex, 1);
                $y--;
                $nbondarray--;
                my @sorted = split(" ", $keep);
                $nbond++;
                printf OUT $bondfmt, $nbond, $sorted[1], $sorted[2], $sorted[3]; 
            }
        }
        else
        {
            $nbond++;
            my @tmp2 = split(" ", $bondarray[0]);
            printf OUT $bondfmt, $nbond, $tmp2[1], $tmp2[2], $tmp2[3];
        }
    }
}

close(OUT);

exit;
                    

Next, we upload and generate the .str file about lactose on CGenFFs official website. It should be mentioned here that the obtained .str file contains CGenFFs score for charge distribution and dihedral angle parameters, which is generally recommended to check that it is less than 10, and manual assignment is recommended for those greater than 50

However, the generated .str file cannot be used directly by GROMACS, so we use the auxiliary python script provided by CHARMM36 to help us convert it into a format recognizable by GROMACS.

python cgenff_charmm2gmx_py3_nx2.py input_fileinput_file .mol2 input_file2.str charmm36-jul2022.ff

It should be noted here that this script requires the NetworkX 1.11 version package and is compatible with Network 2.Version 0 is not compatible. You can download it from the NetworkX website or use

pip install networkx==1.11

After this command, you will get a .mol2 file with the same name as the original .mol2 file. Use this command to convert it into a .gro file

gmx editconf -f input.pdb -o output .gro

Next, we need to merge our two research objects ( the .gro file of BgaB and the lactose molecules) into a single gro file for subsequent operations. Given that GROMACS built-in commands make this task challenging, we recommend performing the manual merging process instead.

Create a new .gro file (here we call it complex.gro)

Copy Protein.gro (here BgaB.gro) to complex.gro.

Copy the coordinates from Compound.gro (this is the coordinate part in the (Lac.gro) file to the complex.gro file, and paste it between the long vectors of the protein nucleus box.

According to the total number of atoms in the Protein.groand Compound.grocomplex.grofile, the number of atoms is usually on the second line

In the previous gmx pdb2gmx command, we set up a topol.topfile for the protein, and we only need to

Add it after the "Protein" molecular type section in the topol.top file #include " compound.itp".

Given that the ligand molecule contains new dihedral angle parameters in the compound.prm file, it is necessary to insert a parameter at the top of the topol.top file.

#include " compound.prm"

Finally, we need to modify the [molecules]. We need to add the compound molecules to the topology file


[ molecules ]
; Compound        #mols
Protein_chain_A     1
Compound         1

Note that the inclusion of the #include statement is very strict and must be added before the [moleculetype] because, according to the calculation logic, all atomic types must be known before the bonding parameters.

Therefore, parameters must be included before constructing any molecule. They must also appear after the #include declaration that contains the main force field.

Next we use:

gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0

To complex.gro create a box for a regular dodecahedron, and set the distance between the box and the molecule

gmx solvate -cp newbox.gro -cs TIP3P.gro -p topol.top -o solv.gro

This command is used to help generate the newbox.gro to generate a solvent environment according to the TIP3P water model.

Heres the deal: If youre paying attention, you’ll notice the protein carries a -17 charge in the previous pdb2gmx output. For those who missed it in the pdb2gmx output, check the [atoms] section of the topol.itp file. Youll spot "qtot-17" right at the end of the last atom line. Lets tackle this issue head-on.

We first obtain a .tpr file using Grompp from the mdp file. For computational efficiency, we recommend using the energy-minimized mdp file (you can find details about its generation in the next section). Run:

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr

Then pass the obtained .tpr file to genion:

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral

The ion names defined by pname and-nname are standard names in GROMACS 4.5. Finally, your [molecules] section should look like this:


[ molecules ]
; Compound        #mols
Protein_chain_A     1
Lac               1     
SOL         212
NA               17

Preprocessing-energy minimization

Now that the system is set up, we will use grompp to generate input information. The input file and mdp file need to be designed by yourself.Here we present our mdp file as a reference:

                        
cat > em.mdp << EOF
; em.mdp - Energy Minimization
define = -DPOSRES         ; Enable position restraints
integrator = steep        ; Use steepest descent method
nsteps = 5000             ; Maximum number of steps
emtol = 10.0              ; Force tolerance (kJ/mol/nm)
emstep = 0.01             ; Initial step size (nm)

; Output control
nstxout = 100             ; Output trajectory every 100 steps
nstlog = 100              ; Output log every 100 steps
nstenergy = 100           ; Output energy every 100 steps

; Interaction settings
cutoff-scheme = Verlet    ; Use Verlet cutoff scheme
ns-type = grid            ; Neighbor search method
coulombtype = PME         ; Electrostatics calculation method
rcoulomb = 1.0            ; Electrostatic cutoff radius (nm)
rvdw = 1.0                ; Van der Waals cutoff radius (nm)
pbc = xyz                 ; 3D periodic boundary conditions
EOF
                        
                    

gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr

Then call mdrun for energy minimization (EM):

gmx mdrun -v -deffnm em

Preprocessing-Equilibrating

First of all, in order to constrain the ligand molecule, it is necessary to generate the position constraint topology file of the ligand.


gmx make_ndx -f jz4.gro -o index_jz4.ndx
 > 0 & ! a H*

Create an index group for the compound that contains all atoms except hydrogen: Then, execute the genrestr module and select the index group obtained earlier

gmx genrestr -f compound.gro -n index_jz4.ndx -o posre_compound.itp -fc 1000 1000 1000

The obtained information is then incorporated into the topology file. Depending on environmental conditions, there are several different methods for adding these components. Here we recommend separately constraining proteins and ligands, using different #ifdef statements to control the inclusion of the ligand constraint file. Here we provide an example:


; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; Include ligand topology
#include "compound.itp"

; Ligand position restraints
#ifdef POSRES_LIG
#include "posre_compound.itp"
#endif

; Include water topology
#include "./charmm36-mar2019.ff/tip3p.itp"

In this case, to constrain both the protein and the ligand simultaneously, you need to specify define = -DPOSRES-DPOSRES_LIG in the mdp file.

To avoid the problem that the temperature coupling function lacks sufficient stability in controlling the kinetic fluctuations of a small atomic system, we consider the protein and complex, ions and solvents as two wholes for temperature coupling. Therefore, we need to create an index group

gmx make_ndx -f em.gro -o index.ndx

Then perform NVT balance, the .mdp file is here.

                        
; nvt.mdp - NVT Equilibration
define = -DPOSRES         ; Position restraints (protein and ligand)
integrator = md           ; Use leap-frog algorithm
dt = 0.00181              ; Time step (ps)
nsteps = 50000            ; Number of steps (100 ps)

; Output control
nstxout = 500             ; Output trajectory every 1 ps
nstvout = 500             ; Output velocities every 1 ps
nstenergy = 500           ; Output energy every 1 ps
nstlog = 500              ; Output log every 1 ps

; Temperature control
tcoupl = V-rescale        ; Use V-rescale temperature coupling
tc-grps = Protein Non-Protein ; Two temperature coupling groups
tau_t = 0.1 0.1           ; Coupling time constant (ps)
ref_t = 300 300           ; Reference temperature (K)

; Pressure control (No pressure coupling in NVT)
pcoupl = no

; Interaction settings
cutoff-scheme = Verlet
ns-type = grid
coulombtype = PME
rlist = 1.2
rcoulomb = 1.2
rvdw = 1.2
fourierspacing = 0.12
pbc = xyz
EOF      
                        
                    


gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt

After NVT balance, perform NPT balance ,the corresponding .mdp file is here:

                        
; nvt.mdp - NVT Equilibration
define = -DPOSRES         ; Position restraints (protein and ligand)
integrator = md           ; Use leap-frog algorithm
dt = 0.00181              ; Time step (ps)
nsteps = 50000            ; Number of steps (100 ps)

; Output control
nstxout = 500             ; Output trajectory every 1 ps
nstvout = 500             ; Output velocities every 1 ps
nstenergy = 500           ; Output energy every 1 ps
nstlog = 500              ; Output log every 1 ps

; Temperature control
tcoupl = V-rescale        ; Use V-rescale temperature coupling
tc-grps = Protein Non-Protein ; Two temperature coupling groups
tau_t = 0.1 0.1           ; Coupling time constant (ps)
ref_t = 300 300           ; Reference temperature (K)

; Pressure control (No pressure coupling in NVT)
pcoupl = no

; Interaction settings
cutoff-scheme = Verlet
ns-type = grid
coulombtype = PME
rlist = 1.2
rcoulomb = 1.2
rvdw = 1.2
fourierspacing = 0.12
pbc = xyz
EOF      
                        
                    


gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt

PProduction Simulation

We performed a MD simulation here. Here we did a 5-ns MD simulation, and the corresponding. mdp file is here.

                        
; md.mdp - Production Simulation
integrator = md
dt = 0.002
nsteps = 2500000           ; 5 ns (2500000 * 0.002 = 5000 ps)

; Output control
nstxout = 0                ; Do not output trajectory (use trr output)
nstvout = 0
nstfout = 0
nstlog = 5000              ; Output log every 10 ps
nstxout-compressed = 5000  ; Output compressed trajectory every 10 ps (for analysis)
compressed-x-grps = System ; Output compressed trajectory of the entire system

; Temperature control
tcoupl = V-rescale
tc-grps = Protein Non-Protein
tau_t = 0.1 0.1
ref_t = 300 300

; Pressure control
pcoupl = Parrinello-Rahman
pcoupltype = isotropic
tau_p = 2.0
ref_p = 1.0
compressibility = 4.5e-5

; Interaction settings
cutoff-scheme = Verlet
ns-type = grid
coulombtype = PME
rcoulomb = 1.0
rvdw = 1.0
pbc = xyz

; Continuation settings
continuation = yes         ; Continue from equilibration phase
constraints = h-bonds      ; Constrain all bonds
constraint-algorithm = LINCS
lincs-order = 4
lincs-iter = 1
EOF      
                        
                    


gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md.tpr
gmx mdrun -deffnm md

After searching the relevant literature, we found that the active catalytic site of BgaB was at Glu303 and Glu148, so we adopted


gmx distance -s md.tpr -f md.xtc -select resname "Lac" and name OAB plus resid 303 and name OE1 -oall

To calculate the distance between lactoses hydroxyl group and the carbonyl oxygen atom of Glu303, our obtained average data is 0.30±0.04 nm. After analyzing the amino acid residues around the catalytic site, we concluded that hydrogen bonding is the primary mechanism driving the catalysis. Through referencing relevant literature and conducting visualization analysis using pymol, we confirmed our findings:

It is evident that the Glu148 and Glu303 sites play primary catalytic roles, with additional contributions from Trp311, His354, Glu351, Arg109, and Asn147. Our molecular dynamics simulations revealed each residues average energy contribution, with key residues highlighted in red circles. These critical sites will be the focus of our subsequent analysis.

Protein remodeling attempt

Engineering Site Combination Selection

Now that we have a better understanding of the catalytic mechanism of BgaB for lactose, we aim to identify amino acid sites with high evolutionary variability that enhance the enzymes catalytic activity. Using Hotspot Wizard, we conducted an analysis to identify additional potential sites. Our research has newly identified cysteine (Cys) at position 113 and asparagine (Asn) at position 147 as key contributors.

After identifying mutation-prone sites, we observed an enormous number of potential combinations. To streamline our approach, we first conducted preliminary stability screening to eliminate combinations that negatively impacted BgaB stability. For stability analysis, we utilized PyroSetta, a tool that can only handle single-point mutations. You may reference our reference scripts, which significantly reduce workload while offering strong adaptability for other tasks.

After conducting a comprehensive analysis of the structure and referencing relevant literature, we identified BgaB as an industrial-grade lactase requiring high thermal stability. Our research revealed its capacity to produce galactooligosaccharides with significant probiotic benefits, though at relatively low yields. To enhance these advantages, we plan to engineer mutations at four key sites: Y272A, E351R, F341T, and R109W. For site 272, we propose converting the benzene ring and hydroxyl group to methyl groups to reduce the catalytic cavitys compatibility with galactose hydrolysis products, thereby decreasing galactose competition at the active site. At site 351, we aim to stabilize lactose binding by modifying its electrical properties. Regarding site 341, we observed that the benzene ring may exhibit stronger stabilizing effects on smaller molecules, suggesting replacement with a smaller amino acid could diminish its galactose-supporting capacity. Finally, for site 109, we consider introducing an indole ring to boost hydrophobic interactions and strengthen hydrogen bonding efficiency.

Evaluation of the Engineering Results

Similarly, we aim to conduct molecular dynamics simulations on our modified BgaB molecule again and analyze its properties in detail to validate the modification effects. Here, we will not repeat the molecular dynamics simulation process. Using the molecular structure obtained from AlphaFold3, we observe that the generated structure shows an ipTM = -pTM = 0.97 value, indicating high reliability.

We start with


gmx rms -s em.tpr -f md.xtc -n index.ndx -tu ns -o rmsd.xvg

And use xmgrace to visualize the results, but you can also use python, you can refer to this


xmgrace rmsd.xvg
dit xvg_show -f rmsd.xvg
xmgrace rmsd.xvg

We noticed that the fluctuation was between 4.8 and 5.0 nm. This indicates that the structure of the enzyme remained relatively stable during the simulation, without large conformational changes, and was relatively stable.

We further analyze the variation of the radius of gyration of the modified BgaB

gmx gyrate -s md.tpr -f md.xtc -o gyrate.xvg

The rotation radius can be seen to range from 2.6 to 2.7 nm, which reflects the flexibility and dynamic changes of the protein during the simulation. Smaller fluctuations indicate that the core structure of the protein is stable.

Finally, we tried to quantify this binding strength as an alternative way to evaluate our enzyme modification

gmx_MMPBSA -f md.xtc -s md.tpr -n index.ndx -i mmpbsa.in -o FINAL_RESULTS_MMPBSA.dat

We determined the binding energy to be-231.06 kJ/mol. By comparing this with the original BgaB-lactose system (binding energy of-212.85 kJ/mol), our modification shows significant improvement in binding stability and accuracy. However, the enhancement remains modest and unlikely to cause excessive product release, suggesting potential enzyme activity enhancement. Further validation across additional parameters is required to confirm these findings.

To evaluate our modification effects, we employed Auto Dock Vina for molecular docking. Since the software cannot directly process PDB files, we utilized AutoDockTools to convert lactoses PDB file into a PDBQT format. Given that proteins exist in complex conformations, removing water molecules from the original file significantly reduced computational complexity. Additionally, adding hydrogen atoms—a crucial factor in determining protein bioactivity—was deemed essential. This approach ultimately yielded the final PDBQT file.

When performing Vina docking, we need to create a config.txtfile manually to define input and output files, determine the binding center coordinates, and specify the docking box dimensions. You can either use pymol to calculate the binding center coordinates or rely on data from the PDB file for this purpose. The docking box size must be larger than your target molecules dimensions. After obtaining the parameters:

vina --config config.txt --receptor protein.pdbqt --ligand compound.pdbqt --out output.pdbqt

We are showing here the affinity scores for each combination of BgaB, lactose and galactose, Glu148 and Glu303 before and after the modification to comprehensively evaluate the effect of our modification

1、After modification, BGA B and lactose are at the Glu148 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-6.34	0	0
2	-6.267	1.088	6.72
3	-6.056	11.06	14.87
4	-5.982	10.82	14.95
5	-5.94	1.516	5.928
6	-5.911	1.936	3.044
7	-5.881	11.11	13.74
8	-5.869	1.965	7.129
9	-5.853	2.348	6.563
10	-5.838	11.3	15.03

2、The original BgaB and lactose are at the Glu148 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-5.881	0	0
2	-5.782	1.329	4.034
3	-5.672	1.417	3.737
4	-5.523	1.253	2.089
5	-5.515	2.292	3.557
6	-5.5	20.65	21.95
7	-5.457	1.097	2.796
8	-5.418	1.456	3.659
9	-5.404	1.298	3.58
10	-5.093	1.455	3.068

3、After modification, BGA B and lactose are at the Glu303 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-6.351	0	0
2	-6.204	2.181	4.575
3	-6.195	2.821	4.956
4	-6.11	2.977	5.162
5	-6.035	2.848	5.605
6	-6.031	20.39	22.61
7	-5.936	2.517	6.073
8	-5.895	3.637	5.986
9	-5.893	2.557	5.664
10	-5.771	2.225	4.331

4、The pre-modified BgaB and lactose at the Glu303 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-5.902	0	0
2	-5.746	1.197	4.205
3	-5.466	11.42	12.57
4	-5.395	6.016	7.626
5	-5.316	12.16	13.76
6	-5.23	1.001	2.717
7	-5.198	1.5	3.646
8	-5.169	1.765	4.465
9	-4.968	12	13.65
10	-4.953	11.9	13.1

Clearly, both the Glu303 and Glu148 site modifications in the modified BgaB strain demonstrated improved activity scores. Lower affinity generally correlates with enhanced catalytic efficiency, indicating our modifications have achieved notable progress. However, we aim to investigate its binding affinity with galactose, as this interaction may not only influence catalytic performance but also interact with galactooligosaccharides—a probiotic component that plays a significant role in gut health.

1、The pre-modified BgaB and galactose at the Glu303 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-6.353	0	0
2	-6.197	2.168	4.579
3	-6.196	2.825	4.967
4	-6.086	2.589	5.606
5	-6.053	3.589	6.011
6	-6.03	20.41	22.62
7	-5.937	2.257	5.972
8	-5.912	3.551	5.002
9	-5.893	2.569	5.682
10	-5.877	19.47	21.12

3、The pre-modified BgaB and galactose at the Glu148 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-5.9	0	0
2	-5.768	1.203	4.2
3	-5.472	11.43	12.57
4	-5.308	12.13	13.71
5	-5.196	1.496	3.646
6	-5.158	1.78	4.474
7	-5.052	11.91	13.47
8	-4.967	11.98	12.7
9	-4.959	11.91	13.2
10	-4.895	12.08	13.95

4、The modified BgaB and galactose are at the Glu148 site

mode	affinity (kcal/mol)	dist from rmsd l.b.	best mode rmsd u.b.
1	-5.868	0	0
2	-5.841	10.02	12.27
3	-5.724	1.823	3.442
4	-5.718	6.533	8.15
5	-5.669	1.436	3.536
6	-5.541	1.222	4.046
7	-5.54	5.15	6.957
8	-5.501	1.781	2.88
9	-5.434	4.417	6.717
10	-5.333	10.18	12.34

It can be observed that the affinity of galactose at the Glu303 site increases, while the Glu148 site remains essentially unchanged. This suggests that galactose has a more difficult time binding to the lactose active site, which may make oligogalactose formation easier. This indicates that our Lacbutler exhibits stronger probiotic effects than conventional Bifidobacterium beyond its basic functions.

Next, we will try to measure the modified BgaB in the wet laboratory and detect its relevant data as the final verification.

[1]Placier GWatzlawick HRabiller C, Mattes R.2009.Evolved β-Galactosidases from Geobacillus stearothermophilus with Improved Transgalactosylation Yield for Galacto-Oligosaccharide Production. Appl Environ Microbiol75:.https://doi.org/10.1128/AEM.00714-09

[2]Dong, Y. N., Wang, L., Gu, Q., et al. (2013). Optimizing lactose hydrolysis by computer-guided modification of the catalytic site of a wild-type enzyme. Molecular Diversity, 17(2), 371-382. https://doi.org/10.1007/s11030-013-9437-y

[3]Dong, Y.-N., Chen, H.-Q., Sun, Y.-H., Zhang, H., & Chen, W. (2015). A differentially conserved residue (Ile42) of GH42 β-galactosidase from Geobacillus stearothermophilus BgaB is involved in both catalysis and thermostability. Journal of Dairy Science, 98(4), 2268-2276. https://doi.org/10.3168/jds.2014-9117

Keith, Jeanette MD1; Hullett, Sandral MD, MPH2. Lactose Intolerance Diagnosis and Solutions (LIDS) Survey Identifies Challenges in the Clinical Management of Lactose Intolerance: 348. American Journal of Gastroenterology 108():p S104, October 2013.

[5] Johnson, A. O., Semenya, J. G., Buchowski, M. S., Enwonwu, C. O., & Scrimshaw, N. S. (1993). Correlation of lactose maldigestion, lactose intolerance, and milk intolerance. The American Journal of Clinical Nutrition, 57(3), 399-401. https://doi.org/10.1093/ajcn/57.3.399

[6] Storhaug, C. L., Fosse, S. K., & Fadnes, L. T. (2017). RETRACTED: Country, regional, and global estimates for lactose malabsorption in adults: a systematic review and meta-analysis. The Lancet Gastroenterology &Hepatology, 2(10), 738-746. https://doi.org/10.1016/S2468-1253(17)30154-1

[7] Dahlqvist, A., Hammond, J. B., Crane, R. K., Dunphy, J. V., & Littman, A. (1963). Intestinal Lactase Deficiency and Lactose Intolerance in Adults: Preliminary Report. Gastroenterology, 45(4), 488-491. https://doi.org/10.1016/S0016-5085(19)34844-9

[8] Savaiano, D. (2011). Lactose Intolerance: An Unnecessary Risk for Low Bone Density. In R. A. Clemens, O. Hernell, & K. M. Michaelsen (Eds.), Milk and Milk Products in Human Nutrition (Vol. 67, pp. 161-171). Nestlé Nutrition Institute Workshop Series. Basel: S. Karger AG. https://doi.org/10.1016/S0016-5085(19)34844-9