Modelling

To assist Wet Lab in selecting candidates for surface display, AlphaFold was used to generate 3D models of the fusion proteins. Structural alignment on PyMOL allowed for the investigation of conformational changes. Finally, the stability of the proteins were analyzed through molecular dynamics simulations with GROMACS.

Introduction

The Wet Lab team of meduCA intended to leverage surface display mechanism to display the constructed fusion proteins, which consist of surface protein and a specific carbonic anhydrase, on Caulobacter crescentus, Escherichia coli and Cyanobacteria (UTEX).

Surface protein used:

Background

Carbonic Anhydrase (CA) are enzymes that catalyze the reversible hydration of carbon dioxide [1] . Some of the CA candidates for surface display investigated were of interest to Wet Lab, while others were chosen as a result of Bioinformatics . When a surface protein is fused with a CA, the resulting fusion protein could exhibit conformational changes on the enzymatic part. This is a critical consideration for Wet Lab because CA activity is required for microbial induced calcium carbonate precipitation (MICP) when forming biocement. Therefore, results from this project will aid Wet Lab in deciding which surface protein and CA strain to include in the fusion protein for the overall success of MICP. Conformational changes could impact enzyme activity, which influences the rate of calcium carbonate precipitation and the strength and durability of the resulting biocement.

Cell surface display is a technique in which functional proteins are anchored to the outer membrane, enabling their direct presentation on the cell surface, providing better access to external environment for calcification [2] . Heterologous proteins can be expressed on the cell surface, allowing direct interaction with the environment and facilitates functional assays without cell lysis [3] . The N-terminal domain of INPN and VCBS domains from certain Gram negative bacteria can be used to anchor enzymes to the outer membrane, enabling surface display [4] .

Construct of surface protein

Prediction of protein structures was done using AlphaFold. [5] is a deep learning based tool developed by Google DeepMind that predicts the 3D structure of proteins from their amino acid sequences. Through machine learning, this artificial intelligence was trained off the Protein Data Bank (PDB), which is a database that links specific proteins to experimentally determined and publicly available macromolecular structures. Traditionally, determining protein structures required experimental methods like X-ray crystallography or cryo-EM which are extremely costly [6]. AlphaFold instead uses neural networks to infer how residues interact and fold, producing highly accurate models from protein sequences. The quality of the predicted structure is evaluated by predicted Local Distance Difference Test (pLDDT) score, which estimates the confidence of the predicted residue position on a scale of 0 to 100 [7].

Structural alignment was conducted and evaluated on predicted structures using Python Molecular Visualizer ([8] ). PyMOL is a molecular visualization tool available both as a software and as a Python library, allowing researchers to interactively explore structures or script analyses. This graphical program allows alignment to be conducted both manually and programmatic use, allowing for consistent re-runs for different input structures. In PyMOL, structural alignment is a method for comparing two or more protein structures to assess their similarity through computing a root mean square deviation (RMSD) minimization. PyMOL calculates the best alignment and report the RMSD between corresponding backbone atoms by superimposing one structure onto another, measured in Ångströms (Å). A lower RMSD, with values below 2 Å, generally considered high quality for backbone alignment and indicates higher structural similarity [9] . This process is especially useful for validating AlphaFold predictions against experimentally determined structures and analyzing conformational changes.

RMSD=1Ni=1N(xixi^)2RMSD = \sqrt{\frac{1}{N}\sum^N_{i=1}(x_i-\hat{x_i})^2}

where:

xix_i = Backbone atom coordinates of predicted protein structure

xi^\hat{x_i} = Backbone atom coordinates of reference protein structure

These results will be useful for Wet Lab to consider when selecting candidates for surface display, as RMSD highlights which fusion proteins are predicted to fold with lower conformational change, potentially indicating a better preservation of enzymatic activity.

GROningen MAchine for Chemical Simulations ([10]) is a versatile package to perform molecular dynamics (MD). GROMACS numerically integrates Newton’s equations of motion for systems comprising millions of particles. This project employed MD simulations using GROMACS to investigate how the structural stability of the fusion proteins respond to various pH values (4, 6, 7, 9), analyzing structural fluctuations and compactness.

The two metrics recorded for each fusion protein is

Aim

This project aims to develop two pipelines that are reusable for other teams.

Methods

Beth inspired the approach to investigate cell surface display. Her previous work modelling fusion proteins using AlphaFold were consulted prior to beginning this project. The methodology of PyMol usage for complex visualization, graphical display and assessment of RMSD value was inspired by Beth. She clarified concepts and answered questions about protein fold rate.

profile-image

Beth Davenport

PhD Student in Synthetic Microbiology for Environmental Bioremediation Applications

The aim of this pipeline is to be reusable for future teams, runnable on high performance computing (HPC), and automate batch AlphaFold and structural alignment.

High-performance computing cluster is a system made up of many powerful computers working together. Tasks can be scheduled to use specific computing resources, which is important for running simulations that are too demanding for a regular personal computer. By distributing the workload across multiple computers, HPC makes it possible to obtain results more efficiently while work with larger datasets.

Graphics processing units (GPU) acceleration is important because they are optimized for parallel processing, which makes them particularly effective for calculations used in molecular dynamics. Offloading intensive tasks to GPUs significantly reduces simulation time and increases overall performance, enabling longer simulations with greater efficiency.

Preliminary AlphaFold Run

AlphaFold architecture involves querying the input sequence against database and evaluating large neural networks, both of which requires massive computing resources. We wanted to perform an initial run of AlphaFold and examine the quality of the predicted structures, so we chose to use Google DeepMind’s AlphaFold web server, which allows users to upload the .fasta files containing the protein sequence and perform AlphaFold prediction in the cloud without the need to manage the computational workflow themselves. We tried using both DNA sequence and amino acid sequence, which are both acceptable input formats of AlphaFold.

The predicted CA structure had mostly high confidence pLDDT scores, while the fusion protein had regions with lower scores. The predicted CA also agreed with the experimentally solved structure on Protein Data Bank base on the RMSD from structural alignment. This is as expected since AlphaFold is trained from the experimentally solved structures on Protein Data Bank.

Interestingly, we also found that the quality scores are different depending on whether the input is in DNA of amino acid form. Since protein folding prediction models are optimized for amino acid sequences and using them avoids translation-related inconsistencies, amino acid sequences were used as input to ensure consistent and reliable results moving forward.

design block icon

We want to perform preliminary run of AlphaFold and examine the quality of the predicted structure.

build block icon build block icon

We used Google DeepMind’s AlphaFold web server to perform structural prediction.

test block icon test block icon

Several fusion proteins and CA sequences were testeed, the results were then imported into PyMOL to examine the quality of folding.

learn block icon learn block icon

The predicted CA structures had mostly high confidence (pLDDT scores) and agreed with the experimentally solved CA structure, whereas the fusion protein structures tend to have regions with lower confidence.

Figure 1: Result of Google DeepMind’s AlphaFold prediction of SazCA, colored by pLDDT scores.

Batch AlphaFold Structure Prediction Pipeline

While Google DeepMind’s AlphaFold web server allows simple and accessible AlphaFold runs, it can only process one protein sequence at a time. Therefore, to perform batch protein structure prediction, it is necessary to create a pipeline that utilizes computing resources and allow multiple AlphaFold runs simultaneously. We have decided to use Nextflow, which is an workflow management system commonly used in bioinformatics. Initially, we wanted to adapt an existing AlphaFold pipeline from nf-core/proteinfold. nf-core is a community that consists of a curated collection of Nextflow pipelines. However, to perform AlphaFold on high-performance computing cluster, the database need to be laid out in a specific manner depending on the version of AlphaFold. Proteinfold expects a different database layout, which we could not manage to set up on the HPC cluster since the source files were not provided. To ensure maximum compatibility and reproducibility, we have chosen to directly use the more well-defined version AlphaFold2.

For a single sequence, AlphaFold produce five predicted structures. They are always named by ranked_0.pdb, ranked_1.pdb, …, ranked_4.pdb, and in decreasing order of confidence. We only examine ranked_0.pdb for downstream analysis since it has the highest confidence.

design block icon

We wanted to build an AlphaFold pipeline that allows batch protein structure prediction.

build block icon build block icon

We created a Nextflow pipeline that allows batch AlphaFold2 runs.

test block icon test block icon

We performed batch AlphaFold prediction of the fusion proteins sequences with the pipeline, and adjusted the computing resources requested based on the protein sequence lengths number of inputs, ensuring that sufficient memory were allocated.

learn block icon learn block icon

We took the predicted structure with the highest confidence of each input and organzied them to analyze the conformational changes in downstream analysis.

Structural Alignment Pipeline

To examine whether the CA in the predicted fusion protein structure exhibit conformational changes that could affect their enzymatic activity, we wanted to perform structural alignment and evaluate the results. Originally, we used PyMOL’s graphical user interface to test a few predicted strutcures. We then built a pipeline with Nextflow that can execute multiple structural alignment given a list of fusion proteins structures, CA structures form PDB, and optionally, the surface protein structures, and write the RMSD output to text file. We executed this pipeline locally, as opposed to on HPC, as structural alignment requires far less compute than structural prediction. The outputs are summarized in the results and discussion section.

design block icon

We wanted to build a pipeline that allows batch structural alignment using PyMOL.

build block icon build block icon

We created a Nextflow workflow that takes the predicted structure sand reference sturctures and writes the alignment output (RMSD values) to text files.

test block icon test block icon

The AlphaFold predicted fusion protein structures were imported into the pipeline to examine their conformational changes.

learn block icon learn block icon

The RMSD values from the output are cross-compared to select for surface display candidates.

Figure 2: Visual representation of structural alignment result in PyMOL. The larger structure is the fusion protein BtCAII_VCBS, colored by pLDDT score (red = high confidence, blue = low confidence). The cyan structure is the experimentally solved BtCAII from PDB. RMSD = 0.571 (1627 atoms aligned).

Molecular Dynamics Simulation Pipeline

AlphaFold is an excellent tool for predicting the static structure of proteins, but it does not provide information on how protein stability or activity may vary under different environmental conditions. Because the bacteria surface display the CAs are embedded in a biobrick that contains martian regolith, and we do not know the local pH, we need to consider the effects of pH on enzymatic activity. Changes in pH can alter the protonation states of key active-site residues, such as histidines and acidic side chains, potentially affecting hydrogen bonding networks and electrostatic interactions, which in turn may influence enzymatic activity. We decided to use molecular dynamics (MD) simulations with the software GROMACS to investigate the effects of pH on the structural stability of the CAs.

GROMACS doesn’t natively support pH-dependent MD simulation, but we can tackle this by protonating ionizable residues prior to simulation to approximate specific pH conditions. We found a web-based tool H++ that can generate the .pdb files of proteins at specific pH. [11] However, H++‘s generated .pdb files are designed to be compatible with Assisted Model Building with Energy Refinement (AMBER), another software commonly used for MD simulation. Specifically, we used tleap from AmberTools to solvate and ionize the protein system and generate AMBER topology (.prmtop) and coordinate (.inpcrd) files. [12] These were then converted into GROMACS-compatible formats (.gro and .top) using the Python library ParmEd. We selected GROMACS over AMBER because it is highly optimized for efficiency and scalability, making it an ideal choice for large-scale simulations [10].

design block icon

We wanted to build a pipeline that can automate the protonating the AlphaFold predicted structures at a specified pH and perform MD simulation.

build block icon build block icon

We chose H++ to protonate the protein residues, and used tleap and ParaEd to convert the topology and coordinate files into GROMACS-compatible formats for MD simulation.

test block icon test block icon

The AlphaFold-predicted structures were subjected to MD simulations under different pH conditions to evaluate their structural stability.

learn block icon learn block icon

We analyzed the pipeline outputs to assess protein stability across different pH conditions, providing insights into the range of pH that supports enzymatic activity.

Specifications

We created reusable batch AlphaFold, structural alignment and molecular dynamics simulation pipeline. All the necessary files in this section are provided in our GitLab repository. This section contains the technical specification.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

AlphaFold2 Pipeline

Structural Alignment Pipeline

spectrum b, blue_red, minimum=0, maximum=100

This colors the structure based on the pLDDT score assigned to each residue. Red indictates high confidence and blue indictates low confidence.

├─ pymol.nf
├─ pymol.config
└─ structures/
   ├─ HpCA_INPN/
   │   └─ ranked_0.pdb
   │   └─ INPN.pdb
   │   └─ HpCA.pdb
   └─ SazCA_VCBS/
       └─ ranked_0.pdb
       └─ c_domain.pdb
       └─ n_domain.pdb
       └─ SazCA.pdb
nextflow pymol.nf -c pymol.config
RMSD (fusion vs SazCA.pdb): 0.2152739316225052
RMSD (fusion vs n_domain.pdb): 2.418569564819336
RMSD (fusion vs c_domain.pdb): 17.096477508544922

Molecular Dynamics Pipeline

maestro workflows

The structural prediction, alignment, and molecular dynamics pipelines were converted into maestro workflows for redistribution, improved reliability, and correctness. To learn more, visit the maestro page, or the repositories on GitLab [structural analysis] [molecular dynamics].

Results and Discussion

PyMOL alignment results of the fusion proteins can be observed in Table 1. The name of each fusion protein indicates the strain of carbonic anhydrase (CA) and the associated surface protein. The strains of CA used were CA from Burkholderia henselae (BhCA), CA I from Brucella suis (BtCAI), CA II from Brucella suis (BtCAII), CA from Helicobacter pylori (HpCA), and CA from Sulfurihydrogenibium azorense (SazCA). The CA fusion proteins were displayed using three chassis systems: UTEX cyanobacteria with VCBS anchors, E. coli BL21 with INPN-based display, and Caulobacter crescentus CB2A with its native RsaA S-layer.

Fusion ProteinAlignment against CAAlignment against surface proteinAlignment against N-terminal of surface proteinAlignment against C-terminal of surface protein
BhCA_VCBS17.751/2.43313.949
BtCAI_VCBS0.165/0.4616.714
BtCAII_VCBS0.535/0.33927.038
HpCA_VCBS0.718/2.2408.435
SazCA_VCBS0.215/2.41917.010
BhCA_INPN8.46620.76120.761/
BtCAII_INPN11.99527.22227.222/
HpCA_INPN4.2109.5869.586/
SazCA_INPN3.31620.31020.310/
BhCA_RsaA11.310/1.771/
BtCAII_RsaA10.357/1.491/
HpCA_RsaA45.313/1.444/
SazCA_RsaA51.307/1.541/
BtCA_BL21_RsaA27.805/11.380/
BhCA_RsaA_secreted///0.089
BtCAII_RsaA_secreted///1.041
SazCA_RsaA_secreted///0.100
HpCA_RsaA_secreted///9.861

Table 1. Root mean square deviation of structural alignment results when aligning fusion proteins against their CA and surface proteins in PyMOL. Fusion protein are named by [CA name]_[surface protein].

Figure 3: Fusion Protein VCBS Alignment Results. Figure 4: Fusion Protein INPN Alignment Results. Figure 5: Fusion Protein RsaA Alignment Results

The AlphaFold predicted fusion proteins were imported into PyMOL, and structural alignment was performed to assess where the CA in the fusion proteins has undergone significant conformational change. The results of the structural alignment pipeline are presented in Table 1.

Wet Lab explained the surface protein is expected to undergo conformational change when fused with a CA because it serves as a scaffold in surface display. This explains why the RMSD values are higher than 2 Å for almost all surface protein alignments. Nevertheless, these results remain relevant to the project when observing the conformational change of the surface protein.

The results from Figure 3 shows that VCBS fused with BtCAI produced the lowest RMSD compared to the alignments investigated. This suggests that this fusion protein exhibits the least conformational change, and be more likely to be catalytically efficient when surface displayed. When using BtCAII in the fusion protein, the alignment against C terminal of surface protein produced the highest RMSD value and BhCA showed the highest RMSD value for alignment against CA. Therefore, Wet Lab is advised against using BtCAII and BhCA for surface display because it could potentially have an effect of enzymatic activity.

The results from Figure 4 shows that HpCA and INPN would have the lowest overall RMSD and BtCAII or BhCA produced the best results for fusion proteins with RsaA according to Figure 5.

Since these results are predictions, the conformational changes may not affect protein function. However, the results are worth noting for Wet Lab because the abnormally high RMSD values of fusion proteins with RsaA could signify incompatibility between the CA and the surface protein, potentially disrupting the function of the CA. The selected fusion protein candidates will be tested by Wet Lab for successful surface display and enzymatic activity.

Figure 6: RMSD result for SazCA
Figure 7: RMSD result for HpCA
Figure 8: RMSD result for BtCAII
Figure 9: Radius of gyration result for BtCAII at pH 4
Figure 10: Radius of gyration result for BtCAII at pH 6
Figure 11: Radius of gyration result for BtCAII at pH 7

In Figures 6 and 7, the RMSD profiles indicate that structural fluctuations are highest at pH 4 for both SazCA and HpCA. At higher pH values (6, 7, and 9), the RMSD values for SazCA show reduced fluctuations, while HpCA reaches a stable plateau between 0.10 and 0.12 nm. These results suggest that both SazCA and HpCA maintain greater structural stability and remain closer to their initial conformations under less acidic conditions.

Interestingly, BtCAII displays its greatest RMSD fluctuation at pH 6, followed by a noticeable drop around 0.9 ns. This apparent instability may reflect incomplete equilibration or insufficient simulation time to reach a steady state, rather than an inherent loss of structural stability. Furthermore, there was less fluctuation measured for the radius of gyration in Figure 11 at pH 7 compared to Figures 9 and 10 at pH 4 and 6 respectively, indicating a more compact and conformationally stable ensemble at higher pH values.

Conclusion and Future Directions

Since enzymatic activity requires the right conformation, the CA in the fusion protein is likely active if their structure remains relatively unchanged. These results provide a basis for estimating the probability of successful surface display, guiding the Wet Lab in selecting which strain of CA and which structural protein are most likely to generate functional constructs.

AlphaFold predicts the proteins as static structures, but our CAs will be deposited in martian regolith, which could have varying local pH. The results from MD simulations allowed for the exploration of the stability of the CAs under different pH conditions, providing insight to Wet Lab regarding the optimal pH range for CA activity in martian regolith.

In the future, docking studies could be performed in addition to MD simulation to examine ligand-enzyme interactions under different pH conditions. By accounting for changes in protonation states and protein conformation, these studies can provide insights into how pH affects binding affinity, stability, and specificity of the CA. Such information would help guide the design of experiments and the selection of optimal conditions for functional assays.

1. Fan LH, Liu N, Yu MR, Yang ST, Chen HL. Cell surface display of carbonic anhydrase on Escherichia coli using ice nucleation protein for CO₂ sequestration. Biotechnol Bioeng [Internet]. 2011 Dec;108(12):2853—64. Available from: https://www.ncbi.nlm.nih.gov/pubmed/21732326
2. Charrier M, Li D, Mann VR, Yun L, Jani S, Rad B, et al. Engineering the S-Layer of Caulobacter crescentus as a Foundation for Stable, High-Density, 2D Living Materials. ACS Synth Biol [Internet]. 2019 Jan 18;8(1):181—90. Available from: https://www.ncbi.nlm.nih.gov/pubmed/30577690
3. Park M. Surface Display Technology for Biosensor Applications: A Review. Sensors (Basel) [Internet]. 2020 May 13;20(10):2775. Available from: https://www.ncbi.nlm.nih.gov/pubmed/32414189
4. VCBS superfamily forms a third supercluster of β-propellers that includes tachylectin and integrins | Bioinformatics | Oxford Academic [Internet]. [cited 2025 Sept 25]. Available from: https://academic.oup.com/bioinformatics/article/36/24/5618/6069543
5. EMBL-EBI. What is AlphaFold? | AlphaFold [Internet]. [cited 2025 Sept 25]. Available from: https://www.ebi.ac.uk/training/online/courses/alphafold/an-introductory-guide-to-its-strengths-and-limitations/what-is-alphafold/
6. Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. Progress in Biophysics and Molecular Biology [Internet]. 2022 Sept 1 [cited 2025 Sept 30];173:72—82. Available from: https://www.sciencedirect.com/science/article/pii/S0079610722000475
7. Guo HB, Perminov A, Bekele S, Kedziora G, Farajollahi S, Varaljay V, et al. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci Rep [Internet]. 2022 June 23 [cited 2025 Sept 30];12:10696. Available from: https://pmc.ncbi.nlm.nih.gov/articles/PMC9226352/
8. PyMOL | pymol.org [Internet]. [cited 2025 Oct 1]. Available from: https://pymol.org/
9. Ellena G, Fahrion J, Gupta S, Dussap CG, Mazzoli A, Leys N, et al. Development and implementation of a simulated microgravity setup for edible cyanobacteria. npj Microgravity [Internet]. 2024 Oct 25 [cited 2025 Oct 1];10(1):99. Available from: https://www.nature.com/articles/s41526-024-00436-x
10. About GROMACS --- GROMACS webpage https://www.gromacs.org documentation [Internet]. 2025 [cited 2025 Sept 26]. Available from: https://www.gromacs.org/about.html
11. Bashford D, Karplus M. pKa’s of ionizable groups in proteins: Atomic detail from a continuum electrostatic model. Biochemistry [Internet]. 1990 Nov 6;29(44):10219—25. Available from: https://www.ncbi.nlm.nih.gov/pubmed/2271649
12. Case DA, Aktulga HM, Belfon K, Cerutti DS, Cisneros GA, Cruzeiro VWD, et al. AmberTools. J Chem Inf Model [Internet]. 2023 Oct 23 [cited 2025 Oct 1];63(20):6183—91. Available from: https://doi.org/10.1021/acs.jcim.3c01153