Attempts to engineer lactase.
While exploring this field of continuous learning and experimentation, we recognized the substantial technical barriers and learning curve involved. To empower future researchers and learners, we aim to share practical insights that help more people acquire essential synthetic biology skills. Drawing inspiration from the guidance provided by numerous teams in our journey, we now offer demonstrative instructions and code examples to clarify fundamental principles.
Exploration of the mechanism of lactase BgaB
Access to molecular structures
Clearly, we need to first understand the mechanisms of action between BgaB and lactose. Therefore, we plan to use molecular dynamics simulations to elucidate their specific interaction mechanisms. The primary task involves determining the three-dimensional structures of both the substrate ——lactose and the BgaB protein.
We obtained the molecular structure information of lactose (PubChem CID: 5913) from PubChem, a widely used chemical information database, and provided the molecular structure data. The data was converted into PDB format using Avogadro to facilitate future simulations. Additionally, for better visualization, we recommend using PyMOL, a molecular visualization tool, to examine the obtained molecular structures.
As for lactase BgaB, we did not find its structural data in the existing database. Therefore, we found the FASTA sequence of the lactase we planned to use on NCBI and used AlphaFold3, which is one of the most accurate protein structure prediction algorithms at present, to generate its 3D structure.
It can be noted that the structure we generated has ipTM = -pTM = 0.97, so we believe that it has high credibility.
The preparation of the molecular dynamics system.
One way we need to understand how BgaB interacts with lactose is through molecular dynamics simulations, and we will do that next.
When performing molecular dynamics simulations, a force field parameter file is essential to define atomic types and various interactions. We selected CHARMM36, available from the MacKerell Labs website, as a widely-used force field for modeling protein-small molecule interactions that has been rigorously tested through numerous experiments. Alternatively, other force fields provided by CGenFF could be considered. After preparing the force field parameter file, we utilized
gmx pdb2gmx -f input_file.pdb -o output_file.gro -ter
To convert the BgaB structure into a GROMACS - readable .gro file, you can select the force - field parameters and water model during the process.
When generating CHARMM36 parameters for BgaB, we encountered issues. The BgaB structure from AlphaFold3 had significant differences in atomic type definitions compared to CHARMM36 parameters. Inconsistent atomic type definitions are unacceptable in simulations.
After discussion, we decided to use CHARMM36 force - field parameters uniformly. We manually modified the atomic definitions in the BgaB’s pdb file and created the force - field parameter file. After verification, it was used in subsequent steps.
For small molecules like lactose, file format conversion is also required to enable GROMACS calculations. Since the CHARMM36 force field cannot recognize lactose molecules, we need a method to generate parameters for lactose within this framework. Here, we opt to use CGenFF for parameter generation (incidentally, quantum mechanical methods could also be employed to derive consistent solutions for original force field parameters of this small molecule, though this might require some computational effort). Therefore, obtaining .mol2 files compatible with CGenFF becomes essential.
Since the CHARMM force field uses a full-atom approach, hydrogen atoms are explicitly represented. However, our lactose crystal structure lacks hydrogen coordinates, requiring the use of Avogadro software to add hydrogen atoms and generate .mol2 files. We observed that some programs fail to list bond indices in ascending order when creating .mol2 files, which may lead to issues when constructing accurate topology files with matching coordinates. We recommend using Justins sort_mol2_bonds.pl script to resolve this issue. Consider the following:
perl sort_mol2_bonds.pl input.mol2 output.mol2
Next, we upload and generate the .str file about lactose on CGenFFs official website. It should be mentioned here that the obtained .str file contains CGenFFs score for charge distribution and dihedral angle parameters, which is generally recommended to check that it is less than 10, and manual assignment is recommended for those greater than 50
However, the generated .str file cannot be used directly by GROMACS, so we use the auxiliary python script provided by CHARMM36 to help us convert it into a format recognizable by GROMACS.
python cgenff_charmm2gmx_py3_nx2.py input_fileinput_file .mol2 input_file2.str charmm36-jul2022.ff
It should be noted here that this script requires the NetworkX 1.11 version package and is compatible with Network 2.Version 0 is not compatible. You can download it from the NetworkX website or use
pip install networkx==1.11
After this command, you will get a .mol2 file with the same name as the original .mol2 file. Use this command to convert it into a .gro file
gmx editconf -f input.pdb -o output .gro
Next, we need to merge our two research objects ( the .gro file of BgaB and the lactose molecules) into a single gro file for subsequent operations. Given that GROMACS built-in commands make this task challenging, we recommend performing the manual merging process instead.
Create a new .gro file (here we call it complex.gro)
Copy Protein.gro (here BgaB.gro) to complex.gro.
Copy the coordinates from Compound.gro (this is the coordinate part in the (Lac.gro) file to the complex.gro file, and paste it between the long vectors of the protein nucleus box.
According to the total number of atoms in the Protein.groand Compound.grocomplex.grofile, the number of atoms is usually on the second line
In the previous gmx pdb2gmx command, we set up a topol.topfile for the protein, and we only need to
Add it after the "Protein" molecular type section in the topol.top file #include " compound.itp".
Given that the ligand molecule contains new dihedral angle parameters in the compound.prm file, it is necessary to insert a parameter at the top of the topol.top file.
#include " compound.prm"
Finally, we need to modify the [molecules]. We need to add the compound molecules to the topology file
[ molecules ]
; Compound #mols
Protein_chain_A 1
Compound 1
Note that the inclusion of the #include statement is very strict and must be added before the [moleculetype] because, according to the calculation logic, all atomic types must be known before the bonding parameters.
Therefore, parameters must be included before constructing any molecule. They must also appear after the #include declaration that contains the main force field.
Next we use:
gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0
To complex.gro create a box for a regular dodecahedron, and set the distance between the box and the molecule
gmx solvate -cp newbox.gro -cs TIP3P.gro -p topol.top -o solv.gro
This command is used to help generate the newbox.gro to generate a solvent environment according to the TIP3P water model.
Heres the deal: If youre paying attention, you’ll notice the protein carries a -17 charge in the previous pdb2gmx output. For those who missed it in the pdb2gmx output, check the [atoms] section of the topol.itp file. Youll spot "qtot-17" right at the end of the last atom line. Lets tackle this issue head-on.
We first obtain a .tpr file using Grompp from the mdp file. For computational efficiency, we recommend using the energy-minimized mdp file (you can find details about its generation in the next section). Run:
gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr
Then pass the obtained .tpr file to genion:
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral
The ion names defined by pname and-nname are standard names in GROMACS 4.5. Finally, your [molecules] section should look like this:
[ molecules ]
; Compound #mols
Protein_chain_A 1
Lac 1
SOL 212
NA 17
Preprocessing-energy minimization
Now that the system is set up, we will use grompp to generate input information. The input file and mdp file need to be designed by yourself.Here we present our mdp file as a reference:
gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
Then call mdrun for energy minimization (EM):
gmx mdrun -v -deffnm em
Preprocessing-Equilibrating
First of all, in order to constrain the ligand molecule, it is necessary to generate the position constraint topology file of the ligand.
gmx make_ndx -f jz4.gro -o index_jz4.ndx
> 0 & ! a H*
Create an index group for the compound that contains all atoms except hydrogen: Then, execute the genrestr module and select the index group obtained earlier
gmx genrestr -f compound.gro -n index_jz4.ndx -o posre_compound.itp -fc 1000 1000 1000
The obtained information is then incorporated into the topology file. Depending on environmental conditions, there are several different methods for adding these components. Here we recommend separately constraining proteins and ligands, using different #ifdef statements to control the inclusion of the ligand constraint file. Here we provide an example:
; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif
; Include ligand topology
#include "compound.itp"
; Ligand position restraints
#ifdef POSRES_LIG
#include "posre_compound.itp"
#endif
; Include water topology
#include "./charmm36-mar2019.ff/tip3p.itp"
In this case, to constrain both the protein and the ligand simultaneously, you need to specify define = -DPOSRES-DPOSRES_LIG in the mdp file.
To avoid the problem that the temperature coupling function lacks sufficient stability in controlling the kinetic fluctuations of a small atomic system, we consider the protein and complex, ions and solvents as two wholes for temperature coupling. Therefore, we need to create an index group
gmx make_ndx -f em.gro -o index.ndx
Then perform NVT balance, the .mdp file is here.
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt
After NVT balance, perform NPT balance ,the corresponding .mdp file is here:
gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt
PProduction Simulation
We performed a MD simulation here. Here we did a 5-ns MD simulation, and the corresponding. mdp file is here.
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md.tpr
gmx mdrun -deffnm md
After searching the relevant literature, we found that the active catalytic site of BgaB was at Glu303 and Glu148, so we adopted
gmx distance -s md.tpr -f md.xtc -select resname "Lac" and name OAB plus resid 303 and name OE1 -oall
To calculate the distance between lactoses hydroxyl group and the carbonyl oxygen atom of Glu303, our obtained average data is 0.30±0.04 nm. After analyzing the amino acid residues around the catalytic site, we concluded that hydrogen bonding is the primary mechanism driving the catalysis. Through referencing relevant literature and conducting visualization analysis using pymol, we confirmed our findings:
It is evident that the Glu148 and Glu303 sites play primary catalytic roles, with additional contributions from Trp311, His354, Glu351, Arg109, and Asn147. Our molecular dynamics simulations revealed each residues average energy contribution, with key residues highlighted in red circles. These critical sites will be the focus of our subsequent analysis.
>
Protein remodeling attempt
Engineering Site Combination Selection
Now that we have a better understanding of the catalytic mechanism of BgaB for lactose, we aim to identify amino acid sites with high evolutionary variability that enhance the enzymes catalytic activity. Using Hotspot Wizard, we conducted an analysis to identify additional potential sites. Our research has newly identified cysteine (Cys) at position 113 and asparagine (Asn) at position 147 as key contributors.
After identifying mutation-prone sites, we observed an enormous number of potential combinations. To streamline our approach, we first conducted preliminary stability screening to eliminate combinations that negatively impacted BgaB stability. For stability analysis, we utilized PyroSetta, a tool that can only handle single-point mutations. You may reference our reference scripts, which significantly reduce workload while offering strong adaptability for other tasks.
After conducting a comprehensive analysis of the structure and referencing relevant literature, we identified BgaB as an industrial-grade lactase requiring high thermal stability. Our research revealed its capacity to produce galactooligosaccharides with significant probiotic benefits, though at relatively low yields. To enhance these advantages, we plan to engineer mutations at four key sites: Y272A, E351R, F341T, and R109W. For site 272, we propose converting the benzene ring and hydroxyl group to methyl groups to reduce the catalytic cavitys compatibility with galactose hydrolysis products, thereby decreasing galactose competition at the active site. At site 351, we aim to stabilize lactose binding by modifying its electrical properties. Regarding site 341, we observed that the benzene ring may exhibit stronger stabilizing effects on smaller molecules, suggesting replacement with a smaller amino acid could diminish its galactose-supporting capacity. Finally, for site 109, we consider introducing an indole ring to boost hydrophobic interactions and strengthen hydrogen bonding efficiency.
Evaluation of the Engineering Results
Similarly, we aim to conduct molecular dynamics simulations on our modified BgaB molecule again and analyze its properties in detail to validate the modification effects. Here, we will not repeat the molecular dynamics simulation process. Using the molecular structure obtained from AlphaFold3, we observe that the generated structure shows an ipTM = -pTM = 0.97 value, indicating high reliability.
We start with
gmx rms -s em.tpr -f md.xtc -n index.ndx -tu ns -o rmsd.xvg
And use xmgrace to visualize the results, but you can also use python, you can refer to this
xmgrace rmsd.xvg
dit xvg_show -f rmsd.xvg
xmgrace rmsd.xvg
We noticed that the fluctuation was between 4.8 and 5.0 nm. This indicates that the structure of the enzyme remained relatively stable during the simulation, without large conformational changes, and was relatively stable.
We further analyze the variation of the radius of gyration of the modified BgaB
gmx gyrate -s md.tpr -f md.xtc -o gyrate.xvg
The rotation radius can be seen to range from 2.6 to 2.7 nm, which reflects the flexibility and dynamic changes of the protein during the simulation. Smaller fluctuations indicate that the core structure of the protein is stable.
Finally, we tried to quantify this binding strength as an alternative way to evaluate our enzyme modification
gmx_MMPBSA -f md.xtc -s md.tpr -n index.ndx -i mmpbsa.in -o FINAL_RESULTS_MMPBSA.dat
We determined the binding energy to be-231.06 kJ/mol. By comparing this with the original BgaB-lactose system (binding energy of-212.85 kJ/mol), our modification shows significant improvement in binding stability and accuracy. However, the enhancement remains modest and unlikely to cause excessive product release, suggesting potential enzyme activity enhancement. Further validation across additional parameters is required to confirm these findings.
To evaluate our modification effects, we employed Auto Dock Vina for molecular docking. Since the software cannot directly process PDB files, we utilized AutoDockTools to convert lactoses PDB file into a PDBQT format. Given that proteins exist in complex conformations, removing water molecules from the original file significantly reduced computational complexity. Additionally, adding hydrogen atoms—a crucial factor in determining protein bioactivity—was deemed essential. This approach ultimately yielded the final PDBQT file.
When performing Vina docking, we need to create a config.txtfile manually to define input and output files, determine the binding center coordinates, and specify the docking box dimensions. You can either use pymol to calculate the binding center coordinates or rely on data from the PDB file for this purpose. The docking box size must be larger than your target molecules dimensions. After obtaining the parameters:
vina --config config.txt --receptor protein.pdbqt --ligand compound.pdbqt --out output.pdbqt
We are showing here the affinity scores for each combination of BgaB, lactose and galactose, Glu148 and Glu303 before and after the modification to comprehensively evaluate the effect of our modification
1、After modification, BGA B and lactose are at the Glu148 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -6.34 | 0 | 0 |
| 2 | -6.267 | 1.088 | 6.72 |
| 3 | -6.056 | 11.06 | 14.87 |
| 4 | -5.982 | 10.82 | 14.95 |
| 5 | -5.94 | 1.516 | 5.928 |
| 6 | -5.911 | 1.936 | 3.044 |
| 7 | -5.881 | 11.11 | 13.74 |
| 8 | -5.869 | 1.965 | 7.129 |
| 9 | -5.853 | 2.348 | 6.563 |
| 10 | -5.838 | 11.3 | 15.03 |
2、The original BgaB and lactose are at the Glu148 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -5.881 | 0 | 0 |
| 2 | -5.782 | 1.329 | 4.034 |
| 3 | -5.672 | 1.417 | 3.737 |
| 4 | -5.523 | 1.253 | 2.089 |
| 5 | -5.515 | 2.292 | 3.557 |
| 6 | -5.5 | 20.65 | 21.95 |
| 7 | -5.457 | 1.097 | 2.796 |
| 8 | -5.418 | 1.456 | 3.659 |
| 9 | -5.404 | 1.298 | 3.58 |
| 10 | -5.093 | 1.455 | 3.068 |
3、After modification, BGA B and lactose are at the Glu303 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -6.351 | 0 | 0 |
| 2 | -6.204 | 2.181 | 4.575 |
| 3 | -6.195 | 2.821 | 4.956 |
| 4 | -6.11 | 2.977 | 5.162 |
| 5 | -6.035 | 2.848 | 5.605 |
| 6 | -6.031 | 20.39 | 22.61 |
| 7 | -5.936 | 2.517 | 6.073 |
| 8 | -5.895 | 3.637 | 5.986 |
| 9 | -5.893 | 2.557 | 5.664 |
| 10 | -5.771 | 2.225 | 4.331 |
4、The pre-modified BgaB and lactose at the Glu303 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -5.902 | 0 | 0 |
| 2 | -5.746 | 1.197 | 4.205 |
| 3 | -5.466 | 11.42 | 12.57 |
| 4 | -5.395 | 6.016 | 7.626 |
| 5 | -5.316 | 12.16 | 13.76 |
| 6 | -5.23 | 1.001 | 2.717 |
| 7 | -5.198 | 1.5 | 3.646 |
| 8 | -5.169 | 1.765 | 4.465 |
| 9 | -4.968 | 12 | 13.65 |
| 10 | -4.953 | 11.9 | 13.1 |
Clearly, both the Glu303 and Glu148 site modifications in the modified BgaB strain demonstrated improved activity scores. Lower affinity generally correlates with enhanced catalytic efficiency, indicating our modifications have achieved notable progress. However, we aim to investigate its binding affinity with galactose, as this interaction may not only influence catalytic performance but also interact with galactooligosaccharides—a probiotic component that plays a significant role in gut health.
1、The pre-modified BgaB and galactose at the Glu303 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -6.353 | 0 | 0 |
| 2 | -6.197 | 2.168 | 4.579 |
| 3 | -6.196 | 2.825 | 4.967 |
| 4 | -6.086 | 2.589 | 5.606 |
| 5 | -6.053 | 3.589 | 6.011 |
| 6 | -6.03 | 20.41 | 22.62 |
| 7 | -5.937 | 2.257 | 5.972 |
| 8 | -5.912 | 3.551 | 5.002 |
| 9 | -5.893 | 2.569 | 5.682 |
| 10 | -5.877 | 19.47 | 21.12 |
3、The pre-modified BgaB and galactose at the Glu148 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -5.9 | 0 | 0 |
| 2 | -5.768 | 1.203 | 4.2 |
| 3 | -5.472 | 11.43 | 12.57 |
| 4 | -5.308 | 12.13 | 13.71 |
| 5 | -5.196 | 1.496 | 3.646 |
| 6 | -5.158 | 1.78 | 4.474 |
| 7 | -5.052 | 11.91 | 13.47 |
| 8 | -4.967 | 11.98 | 12.7 |
| 9 | -4.959 | 11.91 | 13.2 |
| 10 | -4.895 | 12.08 | 13.95 |
4、The modified BgaB and galactose are at the Glu148 site
| mode | affinity (kcal/mol) |
dist from rmsd l.b. |
best mode rmsd u.b. |
|---|---|---|---|
| 1 | -5.868 | 0 | 0 |
| 2 | -5.841 | 10.02 | 12.27 |
| 3 | -5.724 | 1.823 | 3.442 |
| 4 | -5.718 | 6.533 | 8.15 |
| 5 | -5.669 | 1.436 | 3.536 |
| 6 | -5.541 | 1.222 | 4.046 |
| 7 | -5.54 | 5.15 | 6.957 |
| 8 | -5.501 | 1.781 | 2.88 |
| 9 | -5.434 | 4.417 | 6.717 |
| 10 | -5.333 | 10.18 | 12.34 |
It can be observed that the affinity of galactose at the Glu303 site increases, while the Glu148 site remains essentially unchanged. This suggests that galactose has a more difficult time binding to the lactose active site, which may make oligogalactose formation easier. This indicates that our Lacbutler exhibits stronger probiotic effects than conventional Bifidobacterium beyond its basic functions.
Next, we will try to measure the modified BgaB in the wet laboratory and detect its relevant data as the final verification.
[1]Placier GWatzlawick HRabiller C, Mattes R.2009.Evolved β-Galactosidases from Geobacillus stearothermophilus with Improved Transgalactosylation Yield for Galacto-Oligosaccharide Production. Appl Environ Microbiol75:.https://doi.org/10.1128/AEM.00714-09
[2]Dong, Y. N., Wang, L., Gu, Q., et al. (2013). Optimizing lactose hydrolysis by computer-guided modification of the catalytic site of a wild-type enzyme. Molecular Diversity, 17(2), 371-382. https://doi.org/10.1007/s11030-013-9437-y
[3]Dong, Y.-N., Chen, H.-Q., Sun, Y.-H., Zhang, H., & Chen, W. (2015). A differentially conserved residue (Ile42) of GH42 β-galactosidase from Geobacillus stearothermophilus BgaB is involved in both catalysis and thermostability. Journal of Dairy Science, 98(4), 2268-2276. https://doi.org/10.3168/jds.2014-9117
Keith, Jeanette MD1; Hullett, Sandral MD, MPH2. Lactose Intolerance Diagnosis and Solutions (LIDS) Survey Identifies Challenges in the Clinical Management of Lactose Intolerance: 348. American Journal of Gastroenterology 108():p S104, October 2013.
[5] Johnson, A. O., Semenya, J. G., Buchowski, M. S., Enwonwu, C. O., & Scrimshaw, N. S. (1993). Correlation of lactose maldigestion, lactose intolerance, and milk intolerance. The American Journal of Clinical Nutrition, 57(3), 399-401. https://doi.org/10.1093/ajcn/57.3.399
[6] Storhaug, C. L., Fosse, S. K., & Fadnes, L. T. (2017). RETRACTED: Country, regional, and global estimates for lactose malabsorption in adults: a systematic review and meta-analysis. The Lancet Gastroenterology &Hepatology, 2(10), 738-746. https://doi.org/10.1016/S2468-1253(17)30154-1
[7] Dahlqvist, A., Hammond, J. B., Crane, R. K., Dunphy, J. V., & Littman, A. (1963). Intestinal Lactase Deficiency and Lactose Intolerance in Adults: Preliminary Report. Gastroenterology, 45(4), 488-491. https://doi.org/10.1016/S0016-5085(19)34844-9
[8] Savaiano, D. (2011). Lactose Intolerance: An Unnecessary Risk for Low Bone Density. In R. A. Clemens, O. Hernell, & K. M. Michaelsen (Eds.), Milk and Milk Products in Human Nutrition (Vol. 67, pp. 161-171). Nestlé Nutrition Institute Workshop Series. Basel: S. Karger AG. https://doi.org/10.1016/S0016-5085(19)34844-9