Model

Description

Model Description

Background Information

In order to determine the efficacy of our final product, we built a mathematical model based on differential equations to simulate the change of surrounding CO2 concentration over time caused by the altered levels of carbon fixation in our genetically engineered cyanobacteria. Employing this model, we are able to compare the efficiency of our final product and other commercially available technologies, such as CO2 scrubbers for industrial plant exhausts.

The Fundamentals

The most fundamental differential equations used in our model are multivariate. Specifically, they are bivariate functions of two variables r and t, where r stands for the distance from the center of a hypothetical, spherical cyanobacteria cell, and t stands for time. The two variables allow our model to describe information through CO2- and HCO3--relevant functions, such as concentration and flux, at all spaces and times within the hypothetical cell.

We are greatly informed and inspired through reading Dr. Niall Mangan’s paper “Systems analysis of the CO2 concentrating mechanism in cyanobacteria”. We appreciate the research conducted by her and her colleagues, which has tremendously enriched our understanding of this field.

We also had an online meeting with Dr. Mangan, seeking her advice on how biological efficiency is defined within computational models and how various environmental and enzymatic factors influence CO2 fixation. During this discussion, she emphasized the crucial roles of carbonic anhydrase and carboxysomes in mediating CO2 uptake and delivery to RuBisCO, noting that their capacity can become limiting factors in photosynthetic efficiency, which we accounted for in our modeling equations.

A Guide to Michaelis-Menten Kinetics

To understand enzymatic activity, visualize substrate affinity, and model the rate of reaction in BCT1’s HCO3- transport pathway across wild-type and engineered (particularly, the CmpAB and CmpCD fusion proteins’ functionalities are examined both in a separate and holistic manner) strains of S. elongatus PCC 7942, the team employs the concept of Michaelis-Menten Kinetics. Conceptually, the Michaelis-Menten Kinetics model explains how reaction rates are dependent on enzyme and substrate concentrations.

Below is a step-by-step guide for deriving the Michaelis-Menten equation (Libretexts, 2024).

Equation 1

General reaction scheme of a single-substrate enzyme-catalyzed reaction, where E=enzyme, S=substrate, ES=enzyme-substrate complex, P=product, and k1, -1, 2=rate constants.

Using the Steady-State approximation, a method of deriving rate laws with the assumption of a constant intermediate concentration (in this case, [ES]), where its rate of formation (RF=k1[E][S], assuming [P]=0) equals the rate of breakdown, or the reverse reaction (RR=k2[ES]+k-1[ES]=(k2+k-1)[ES]),

Equation 2
Equation 3
Equation 4

where KM is a constant describing half-filled enzyme active sites, calculated with the equation KM = k2 + k-1 k1 .

Given that the total amount of enzymes, ET, could be represented by ET=[ES]+[EF], where EF=free-bound enzymes, equation (4) could be rewritten as:

Equation 5
Equation 6

Lastly, plug in the rate law of the rate-limiting step, v0=k2[ES], where v0=instantaneous rate of reaction.

Equation 7

Assuming that S>>KM, v0~k2[ET], and v0=vmax, where vmax corresponds to the maximum reaction velocity with most enzymes fully saturated with substrate, the Michaelis-Menten Equation is obtained:

Equation 8

where v=reaction rate.

In conclusion, with known values of Vmax and KM and substrate concentrations obtained through functional assays, the Michaelis-Menten equation could be used to specifically compute the flux of HCO3- across the cyanobacterial inner membrane.

Overall Flow Diagram:

Figure 1

Figure 1. Schematic representation of the inorganic carbon transport pathway in the cyanobacterial carbon-concentrating mechanism (CCM) that is being modelled. Each distinct stage is referenced in subsequent modeling sections. Firstly, extracellular CO2 diffuses into the periplasm in the outer membrane. Then, inside the periplasm, both CO2 and HCO3- equilibrate, and CO2 is directly transported into the cytoplasm. Periplasmic HCO3- then binds to inner membrane transporters, including the CmpA component of the BCT1 transporter, initiating the HCO3- uptake mechanism. Subsequent to that, HCO3- is actively transported across the inner membrane into the cytosol via the BCT1 complex comprising A, B (a homodimer), C, and D subunits, driven by ATP hydrolysis from component C and D and multiple other transporters. Once in the cytosol, HCO3- enters the carboxysome by diffusion. Inside the carboxysome, HCO3- is converted to CO2 by carbonic anhydrase (CA), elevating the local CO2 concentration. Finally, CO2 is fixed by RuBisCO, producing 3-phosphoglycerate (3-PG). The figure is not drawn to scale.

A Guide to Fick’s Laws

In intracellular locations without membranes or shells, Fick’s Laws are commonly used to describe the diffusion of CO2 and HCO3-. Here, we briefly introduce Fick’s two key laws related to diffusion flux and rate of change of concentration (Dickson, n.d.).

Fick’s First Law of Diffusion states that the flux due to diffusion is proportional to the negative concentration gradient of that substance. Specifically, the flux is equal to the negative diffusion coefficient of the solvent through which that substance diffuses multiplied by the change in that substance’s concentration gradient — also interpreted as difference in concentration across space. This law is used as an anchor point within the cytosol for each change in concentration within distances as small as the thickness of a phosphobilipid layer.

On the other hand, Fick’s Second Law of Diffusion relates the rate of change of a solute’s concentration to the concentration’s second partial derivative; in other words, it connects “how fast” the concentration diffuses with “how curved” the concentration gradient is. Through allowing us to interchange between partial derivatives with respect to time and position, this law is very useful for our two molecules diffusion through the cytoplasm and any other non-membrane or non-shell spaces.

General Assumptions

To simplify the model, the following assumptions were made:

  1. The outer membrane, inner membrane, and carboxysome are assumed to be perfect concentric spheres, with the cell radius set to 1.446 µm. All membranes are treated as infinitesimally thin, and thus membrane thickness is neglected.
  2. The extracellular CO2 concentration is assumed to remain constant and saturated. Under air-equilibrated conditions (0.04% CO2), the external CO2 concentration is 13 µM, whereas under flue gas conditions (15%CO2) it is 5mM.
  3. The periplasmic sodium concentration is fixed at [Na+]=18mM, and the cytoplasmic ATP concentration is assumed to be saturated
  4. The model assumes that inorganic carbon is limited in the extracellular environment.
  5. The model requires efficient carbon fixation at the following two conditions:
    1. The CO₂ concentration within the carboxysome must be sufficiently high to saturate RuBisCO while suppressing its competitive oxygenation reaction.
    2. The carbonic anhydrase within the carboxysome remains unsaturated, enduring that excess HCO₃⁻ transport does not result in unnecessary energy expenditure.
  6. The enzymatic parameters of MalK were used to calculate the kinetic properties of CmpC and CmpD.
  7. In modeling the RuBisCO reaction rate, the concentration of RuBP is assumed to be saturating, and no oxygenation reaction occurs under the modeled conditions.

Pathway & Checkpoints

To understand the entire pathway involving CO2 and HCO3- changes in concentration, we have broken it down into seven key checkpoints. For our convenience and mathematical purposes, the checkpoints are assigned to account for concentration changes at different positions (or r, the distance from the theoretical “central carboxysome”). The only exceptions are checkpoints #6 and #7, where, although carbonic anhydrase and RuBisCO are both in the carboxysome, they function differently and therefore must be separately calculated.

Checkpoint #1: CO2 and HCO3- intake through the cyanobacterial outer membrane

The first checkpoint marks the entry of CO2 into the cyanobacteria through simple diffusion since small gas molecules easily pass through the membrane. Focusing on the entry of CO2, we employ the Fick's first law that relates the flux to the concentration gradient, which gives us the following equation.

Equation 9

Legend
JC: the flux of CO2 through the outer membrane into the periplasm (in units)
PC: the outer membrane’s permeability for CO2 (in units)
ΔC: the difference in CO2 concentration between the extracellular environment and the periplasm

We also have a formula further describing the outer membrane’s permeability for CO2.

Equation 10

Legend
K: the partition coefficient, a ratio that describes a compound’s (CO2) distribution between two immiscible solvents (the lipid bilayer outer membrane and aqueous periplasm) at equilibrium, = 0.95
D: the diffusion coefficient of CO2, = 5×10-8 cm2/s
Δx: the thickness of the outer membrane, = 8×10-9 m

Combining the previous two equations and elaborating on the ΔC, we get this equation for checkpoint #1.

Equation 11

Legend
Cec: extracellular CO2 concentration
Cperi: periplasmic CO2 concentration

On the other hand, since HCO3- is a charged molecule, its simple diffusion through the phospholipid bilayer is negligible and will not be accounted for. Therefore, periplasmic HCO3- concentrations are assumed to be completely due to the reversible CO2 hydration and subsequent carbonic acid dissociation reactions, as shown below.

Equation 12

We assume that the intermediate reaction is at quasi-steady state, which means that the rate of change of the intermediate (H₂CO₃) periplasmic concentration is equal to zero. The rate of change of H₂CO₃ concentration is equal to the sum of two rates that cause its formation minus the other two rates that cause its depletion. Equating the two statements above, we can list the equation shown below.

Equation 13

Rearranging the terms and solving for the concentration of H₂CO₃, we get the following equation.

Equation 14

Then, the rate of change of the concentration of HCO3- is equal to the rate of the reaction that produces HCO3- minus the rate of the reaction that depletes HCO3-. Furthermore, substituting the expression for the concentration of H₂CO₃, we get an expanded expression shown below.

Equation 15

Since k₂ has a much larger numerical value than k₋₁, k₂ + k₋₁ is approximately equal to k₂. The mathematical expression of this approximation is presented below as well.

Equation 16

As a result, we are able to simplify the denominator of the previous expression for the rate of change of the concentration of HCO3-, and cancel out common coefficients in the numerator and denominator. After crossing out some more canceled terms, we get the resulting expression presented below.

Equation 17

From this proof, we know that the rate of change of HCO3- concentration is equal to k₁ multiplied by the concentration of CO2. The final equation for HCO3- production, which can be viewed similarly as flux here in the periplasm, is written as the following equation concluding this checkpoint.

Equation 18

Checkpoint #2: CO2 and HCO3- Transport Across the Inner Membrane

The second checkpoint is a key point where our model differentiates our overexpression strain from wild-type S. elongatus PCC 7942. On the inner membrane, there are multiple transporters and reactions that influence the influx of HCO3-. The main ones included in our modeling are HCO3- active transporter, BCT1; Na⁺-dependent HCO3- transporters, BicA and StbA; and local alkaline pockets (LAP) resulting from proton outflux in the electron transport chain on the thylakoid membrane, which causes an increase in HCO3- ion concentration due to Le Chatelier’s Principle that is dependent on internal (cytoplasmic) CO2 concentration. For BCT1, since they are encoded by our overexpressed CmpABCD operon, there is an expected increase in the amount of BCT1. As a result, we decided to use XBCT1, a coefficient representing the concept of “fold change”, to describe the accumulated effects of overexpressing CmpABCD on HCO3- influx. This coefficient is mainly influenced by light intensity due to the light-induced nature of our promoter ppsbA1. Summing up all the effects, we get a general equation of the HCO3- flux through the inner membrane as shown below.

Equation 19

Legend
JH: the net inward flux of HCO3- through the cyanobacterial inner membrane
JBCT1: the flux of HCO3- due to BCT1
XBCT1: the fold change of BCT1 concentration due to CmpABCD overexpression compared to wild-type
JBicA: the flux of HCO3- due to HCO3- transporter BicA
JSbtA: the flux of HCO3- due to HCO3- transporter SbtA
JLAP: the flux of HCO3- due to local alkaline pockets (LAP)

Then, we expand the equations by substituting the Michaelis-Menten equations that describe each of them, as shown below.

Equation 20

To determine the concentration of CmpA bound to HCO3- ([A-H]), which can be understood as the effective CmpA components that actually contribute to HCO3- transport, we have to take the total CmpA concentration and multiply it by a ratio that accurately counts the HCO3--bound CmpA components. As the periplasmic HCO3- concentration approaches the dissociation constant, more spare CmpA components “grab” onto HCO3-, taking the concentration of bound CmpA near the total concentration of CmpA components. As a result, we can use this equation, shown below, to substitute for all bound CmpA components in the previous long equation.

Equation 21

Legend
[A-H]: the concentration of CmpA components bound to HCO3-
[A]: the concentration of total CmpA components
Hperi: the periplasmic HCO3- concentration
Kd: the dissociation constant of CmpA for HCO3-

Substituting this expression, we get the final equation for HCO3- net influx through the inner membrane, as shown below.

Equation 22

Legend
JC: the net inward flux of CO2 through the inner membrane
VBCT1: the maximum reaction rate of BCT1
[A]: the concentration of total CmpA components
Hperi: the periplasmic HCO3- concentration
Kd: the dissociation constant of CmpA for HCO3-
KBCT1: the dissociation constant of BCT1 for HCO3--binded CmpA
XBCT1: the fold change of BCT1 concentration due to CmpABCD overexpression compared to wild-type
VBicA: the maximum reaction rate of the HCO3- transporter BicA
VSbtA: the maximum reaction rate of the HCO3- transporter SbtA
[Na+]: the periplasmic Na+ concentration
α: maximal reaction rate of conversion from HCO3- to CO2
Kα: CO2 concentration at half maximal activity of the conversion reaction

On the other hand, we also have an equation for CO2 influx through the inner membrane. Since the small nonpolar molecule diffuses through the inner membrane, we have term one that describes its diffusion. The second term, however, has a negative sign because it describes the depletion of CO2 due to conversion reactions caused by local alkaline pockets, as described previously. Combining these two effects, we get the equation shown below.

Equation 23

Legend
JC: the net inward flux of CO2 through the inner membrane
PC: the permeability of the inner membrane to CO2
ΔC: the difference in CO2 concentration between the periplasm and the cytoplasm
α: the maximal reaction rate of conversion from CO2 to HCO3-
Cin: the internal, or cytoplasmic, CO2 concentration
Kα: the CO2 concentration at half maximal activity of the conversion reaction

Checkpoint #3: CO2 and HCO3- Concentration Rate of Change in the Cytoplasm

The third checkpoint describes the diffusion of CO2 and HCO3- through the cytoplasm until outside of the carboxysome. Due to the tridimensionality of our model, we need an extra step to confine the gradient of both molecules into one single direction — that is, along the position axis x pointing from the center of the carboxysome to outside of the cell. The other two components perpendicular to the x-axis, namely the y and z components, do not contribute to fluxes along that axis, and therefore should be ignored.

This brings us Fick’s Second Law, which states that the time partial derivative of a molecule undergoing diffusion is proportional to the laplacian (denoted by nabla squared, where nabla is the upside down triangle), a second derivative for multivariate functions, of that molecule’s concentration as a function of position and time. This applies for both CO2 and HCO3- and is described through the two equations below.

Equation 24
Equation 25

Legend
∂C ∂t : rate of change of CO2 concentration
C: CO2 concentration as a function of position and time
∂H ∂t : rate of change of HCO3- concentration
H: HCO3- concentration as a function of position and time

Essentially, in this checkpoint, we are turning time partial derivatives into position partial derivatives through removing unwanted portions and focusing on one direction only: the direction that contributes to recording flux from the carboxysome straight to outside of the cell.

Checkpoint #4: CO2 and HCO3- Diffusion into the Carboxysome

Transport of CO2 and HCO3- across the proteinaceous carboxysome shell is modeled as permeability-limited diffusion. The flux is proportional to the concentration difference between the cytosol and the carboxysome interior, which is scaled by the permeability coefficient Kcbxs. This process follows Fick’s law and defines the inward fluxes of inorganic carbon species into the carboxysomal microenvironment.

Equation 26

Legend
D ∂C ∂r : the diffusive flux of CO2
D ∂H ∂r : the diffusive flux of HCO3-
Kcbxs: the optimal carboxysome permeability
Ccyt: the concentration of CO2 in the cytosol
Ccbxs: the concentration of CO2 in the carboxysome
Hcyt: the concentration of HCO3- in the cytosol
Hcbxs: the concentration of HCO3- in the carboxysome

Although previous structural studies have suggested that the positively charged pores on the carboxysome surface could preferentially facilitate diffusion of negatively charged HCO3-, direct experimental measurements of permeability remain unavailable. Therefore, in this model, we adopt the simplest assumption that HCO3- and CO2 share the same permeability across the carboxysome shell (Yeates et al., 2008; Dou et al., 2008; Cheng et al., 2008).

Checkpoint #5: Inside the Carboxysome

The fifth checkpoint, governed by these two partial differential equations, describes how the concentrations of the two molecules change in the carboxysome over time.

Equation 27

Legend
∂C ∂t : the rate of change of carboxysomal CO2 concentration
D∇2C: the effects of diffusion that drive CO2 movement
∂H ∂t : the rate of change of carboxysomal HCO3- concentration
D∇2H: the effects of diffusion that drive HCO3- movement
RCA: the reaction rate of carbonic anhydrase
RRub: the reaction rate of RuBisCO

Breaking them down into individual terms has helped us understand these two equations. First of all, the first terms are identical to those in checkpoint #3, which uses Fick’s Second Law to confine concentration changes to one single direction. Secondly, carbonic anhydrase turns HCO3- into CO2, which means that its reaction rate is reflected as a positive contribution to the concentration of CO2, and on the other hand, a negative depletion to the concentration of HCO3-. Thus, it takes on a positive coefficient, denoted by the plus sign, on the equation for CO2, but a negative coefficient, denoted by the minus sign, on the equation for HCO3-. Last but not least, since RuBisCO binds with CO2 and causes a decrease in CO2 concentration, its rate of change is negative for CO2. However, RuBisCO does not affect the concentration of HCO3-, so its rate is not accounted for in the second equation describing the rate of change of HCO3- concentration.

In the two following checkpoints, the reaction rates of both enzymes will be elaborated and explained with detail.

Checkpoint #6: Carbonic anhydrase reaction rate

Inside the carboxysome, HCO3- reacts with protons to form carbonic acid, which rapidly decomposes into CO2 and water:

Equation 28

This reaction is catalyzed by carbonic anhydrase (CA), an enzyme that accelerates the interconversion of HCO3-and CO2. In addition, as a part of being in the local alkaline pocket, CO2 also directly reacts to become HCO3- without having the intermediate carbonic acid. Chemically, this step is crucial because it elevates the local CO2 concentration within the carboxysome, driving efficient carbon fixation by RuBisCO (in the next checkpoint).

The rate of the carbonic anhydrases catalyzed reaction is described by a bidirectional Michaelis-Menten formulation that accounts for both the hydration and dehydration reactions.

Equation 29

Legend
RCA: the reaction rate of carbonic anhydrase
[CA]: concentration of carbonic anhydrase
CCbxs: concentration of CO2 in carboxysome
HCbxs: concentration of HCO3- in carboxysome
[H+]: concentration of H+
Keq: equilibrium constant
Kh,c: equilibrium constant of concentration of CO2 at which hydration is half-maximum
Kh,b: equilibrium constant of concentration of HCO3- at which hydration is half-maximum
kh,cat: hydration catalytic constant for dehydration reaction
nCbxs: the total amount of carboxysomes

Because the reaction is reversible, we modeled CA activity with a bi-directional Michaelis-Menten expression in a bivariate equation. Inside the carboxysome, the predominant direction of reaction is toward CO2 formation. Consequently, RCA functions as a sink for HCO3- and a local source of CO2, effectively elevating the carboxysomal CO2 concentration above equilibrium and driving efficient RuBisCO carboxylation.

Checkpoint #7: RuBisCO Reaction Rate

The terminal step in our model describes the fixation of CO2 by RuBisCO, which catalyzes the carboxylation of RuBP to yield two molecules of 3PG.

Equation 30

RuBisCO kinetics are described using a Michaelis-Menten form that accounts for the competing interaction of CO2 at the enzyme’s active site. The overall reaction rate is given by:

Equation 31

Legend
RRuBisCO: reaction of carbon fixation
CCbxs: local CO2 concentration inside the carboxysome
[RuBisCO]: concentration of RuBisCO enzyme
kcat: rate constant of RuBisCO carboxylation
Km: Michaelis constant for RuBisCO
nCbxs: the total amount of carboxysomes

Effect of light intensity:
Light intensity exerts a significant influence on the regulation of carbon uptake and fixation in the carbon concentrating mechanism for our model through the psbA1 promoter we cloned in, exhibiting strong light-responsive behavior. Previous studies using psbA1-lacZ fusion constructs have demonstrated that transcriptional activity from this promoter is significantly enhanced under high irradiance, reflecting its natural role in coordinating photosynthetic protein expression with photon flux (Nair et al., 2001).

Likewise, in our engineered strain, the CmpAB and CmpCD operon, which encodes the BCT1 bicarbonate transporter, is placed under the control of the psbA1 promoter. Consequently, increased light intensity directly elevates CmpABCD expression, thereby enhancing HCO3- transport efficiency across the inner membrane. This regulatory effect strengthens the cell’s capacity to accumulate inorganic carbon within the cytosol, ultimately supplying higher substrate concentrations to the carboxysome.

Enhanced carbon uptake under high light is further coupled with upregulation of carbonic anhydrase and RuBisCO expression within the carboxysome (checkpoints 6-7). Elevated CA activity facilitates rapid interconversion between HCO3- and CO2, sustaining high localized CO2 concentrations, while increased RuBisCO abundance allows more active sites for carboxylation reactions. In parallel, high irradiance has been associated with an increase in carboxysome biogenesis, providing additional microcompartments to accommodate the augmented enzymatic machinery.

Appendix

Table 1. Parameters, values, and sources

Checkpoint Notation Meaning Value Reference
1 Pc the outer membrane’s permeability for CO2 0.0035 m/s-1 McGrath & Long, 2014
1 K the partition coefficient, a ratio that describes a compound’s (CO2) distribution between two immiscible solvents (the lipid bilayer outer membrane and aqueous periplasm) at equilibrium 0.95 (mL CO2/mL lipid) Gutknecht et al., 1977
1 D the diffusion coefficient of CO2 10-5 cm2/s Mangan & Brenner, 2014
1 ΔX the thickness of the outer membrane 8 nm Assumed
2 XBCT1 the fold change of BCT1 concentration due to CmpABCD overexpression compared to wild-type 1 Assumed
2 Kd the dissociation constant of CmpA for HCO3- 5μM Koropatkin et al., 2007
2 VBCT1 the maximum reaction rate of BCT1 5.5 × 10-5 mol m-2s-1 McGrath & Long, 2014
2 KBCT1 the dissociation constant of BCT1 for HCO3--binded CmpA 0.015 mol m-3 McGrath & Long, 2014
2 VBicA the maximum reaction rate of the HCO3- transporter BicA 1.85 × 10-4 mol m-3 McGrath & Long, 2014
2 VSbtA the maximum reaction rate of the HCO3- transporter SbtA 2.246 × 10-5 McGrath & Long, 2014
2 α maximal reaction rate of conversion from HCO3- to CO2 0.012 mol m-3 McGrath & Long, 2014
2 Kα CO2 concentration at half maximal activity of the conversion reaction 0.075 mol m-2s-1 McGrath & Long, 2014
2 PC the permeability of the inner membrane to CO2 0.3 cm s Mangan & Brenner, 2014
2 [A] The concentration of CmpA in the periplasm WT: 2.7 × 10-3mM
Engineered:
LL: 2.806 × 10-3mM
ML: 2.792 × 10-3mM
HL: 2.777 × 10-3mM
Calculated using Michaelis Menten equation.
Miller and Jeffrey, 1972; Schaefer and Golden, 1989
4 Kcbxs the optimal carboxysome permeability 0.3 cm s Managan & Brenner, 2014
6 Kh,cat hydration catalytic constant 0.3 × 10-6s-1 McGrath & Long, 2014
6 [CA] concentration of carbonic anhydrase LL: 8 × 10-4 mM
ML: 1.01 × 10-3 mM
HL: 1.45 × 10-3 mM
Calculated from Sun et al., 2019
6 Keq equilibrium constant 5.6 × 10-5 mol m-3 McGrath & Long, 2014
6 Kh,c equilibrium constant of concentration of CO2 at which hydration is half-maximum 1.5 mol m-3 McGrath & Long, 2014
6 Kh,b equilibrium constant of concentration of HCO3- at which hydration is half-maximum 34 mol m-3 McGrath & Long, 2014
6, 7 nCbxs the total amount of carboxysomes LL: 2.6 per cell
ML: 5 per cell
HL: 9.5 per cell
Calculated from Sun et al., 2019
7 [RuBisCO] concentration of RuBisCO enzyme LL: 0.027 mM
ML: 0.062 mM
HL: 0.109 mM
Calculated from Sun et al., 2019
7 kcat rate constant of RuBisCO carboxylation 1.5 mol m-3 McGrath & Long, 2014
7 Km Michaelis constant for RuBisCO 1.5 mol m-3 McGrath & Long, 2014

References

Cheng S, Liu Y, Crowley CS, Yeates TO, Bobik TA (2008). Bacterial microcompartments: their properties and paradoxes. Bioessays. doi: 10.1002/bies.20830.

Dou Z, Heinhorst S, Williams EB, Murin CD, Shively JM (2008). CO2 fixation kinetics of Halothiobacilus neapolitanus mutant carboxysomes lacking carbonic anhydrase suggest the shell acts as a diffusional barrier for CO2. The Journal of Biological Chemistry. doi: 10.1074/jbc.M709285200.

Gutknecht, J., Bisson, M. A., & Tosteson, F. C. (1977). Diffusion of carbon dioxide through lipid bilayer membranes: Effects of carbonic anhydrase, bicarbonate, and unstirred layers. The Journal of General Physiology, 69(6), 779–794. https://doi.org/10.1085/jgp.69.6.779

Koropatkin, N. M., Koppenaal, D. W., Pakrasi, H. B., & Smith, T. J. (2007). The structure of a cyanobacterial bicarbonate transport protein, CmpA. Journal of Biological Chemistry, 282(4), 2606–2614. https://doi.org/10.1074/jbc.m610222200

Libretexts. (2024, March 2). Michaelis-Menten Kinetics. Chemistry LibreTexts. https://chem.libretexts.org/Bookshelves/Biological_Chemistry/Supplemental_Modules_(Biological_Chemistry)/Enzymes/Enzymatic_Kinetics/Michaelis-Menten_Kinetics

Nair, U., Thomas, C., & Golden, S. S. (2001). Functional elements of the strong psbai promoter of synechococcus elongatus PCC 7942. Journal of Bacteriology, 183(5), 1740–1747. https://doi.org/10.1128/jb.183.5.1740-1747.2001

Mangan, N. M., & Brenner, M. P. (2014). Systems analysis of the CO2 concentrating mechanism in cyanobacteria. eLife, 3. https://doi.org/10.7554/elife.02043

McGrath, J. M., & Long, S. P. (2014). Can the cyanobacterial carbon-concentrating mechanism increase photosynthesis in crop species? A theoretical analysis . Plant Physiology, 164(4), 2247–2261. https://doi.org/10.1104/pp.113.232611

MedChemExpress, “ONPG Product Data Sheet,” HY-15926 datasheet, Jun. 2021

Schaefer, M. R., & Golden, S. S. (1989). Differential expression of members of a cyanobacterial psba gene family in response to light. Journal of Bacteriology, 171(7), 3973–3981. https://doi.org/10.1128/jb.171.7.3973-3981.1989

Sun, Y., Wollman, A. J., Huang, F., Leake, M. C., & Liu, L.-N. (2019). Single-organelle quantification reveals stoichiometric and structural variability of carboxysomes dependent on the environment. The Plant Cell, 31(7), 1648–1664. https://doi.org/10.1105/tpc.18.00787

Yeates TO, Kerfeld CA, Heinhorst S, Cannon GC, Shively JM (2008). Protein-based organelles in bacteria: carboxysomes and related microcompartments. Nature Reviews Microbiology. doi: 10.1038/nrmicro1913.

Modeling Result

Modeled bicarbonate/CO2 flux across the inner membrane by BCT1

Equation 1 Equation 2 Equation 3 Equation 4

As shown from equation 1 through 4, using the Michaelis–Menten formulation for BCT1, we evaluated steady-state inward flux J for four genotypes—wild-type (equation 1), CmpAB fusion (equation 2), CmpCD fusion (equation 3), and CmpAB fusion and CmpCD fusion (equation 4).

Quantitatively, CmpCD fusion increases J by roughly an order of magnitude (10 folds) over wild-type in our calculation, while CmpAB fusion provides a modest (~10–20%) boost. The affinity improvement is supported by molecular docking, where the binding free energy obtained from AutoDock was converted to a dissociation constant Kd using ΔG=RTln⁡Kd. The CmpAB fusion has a Kd of 4.492 µM (ΔG ≈ −30.5 kJ mol⁻¹), reflecting stronger substrate binding than the wild type which is Kd=5µM. (Rottet et al., 2024). Combining both fusions (CmpAB and CmpCD) yielded the highest flux (an additional ~10–20% increase) reflecting synergistic effects of enhanced substrate capture by CmpA and increased transporter turnover or abundance from the CmpCD module.

Equation 5 Equation 6

As shown above, HCO3-concentration was calculated using the equilibrium constants of CO2 hydration and HCO3-formation. Using Henry’s law, we can calculate the CO2 solubility in water, which is around 13µM. Assume gaseous CO2 dissolving in water and CO2 diffusion into the outer membrane are instantaneous. The HCO3-concentration would be around 5.49*10-2 mol/m3/s.

Implication for carboxysomal CO2 supply

Because CA rapidly converts HCO3-to CO2 inside the carboxysome, the increased flux in CD and AB+CD elevates carboxysomal CO2, pushing RuBisCO toward saturation under medium–high light in silico.

Decision gate for experiments

After confirming in the model that all cloned strain AB+CD produce the highest inner-membrane inorganic-carbon flux and hence higher CO2 delivery to the carboxysome than wild type, we proceeded to wet-lab construction and testing of the fusion designs, prioritizing AB+CD due to its larger predicted impact.

Use Manual and Interpretations of our model (Please go to Mathematical Model under wet lab in our wiki)

In our final model’s MATLAB simulation, there are two main classes of parameters — the framework-based parameters (FBPs) and the system-specific parameters (SSPs) — that contain a variety of adjustable values. While FBPs control the overall structural qualities of our model, SSPs allow more detailed adjustments to variables that are specific to improving the cyanobacterial carbon concentrating mechanism.

There are three FBPs in our model, namely dt (s), t_end (s), and Newton tol (unitless). A small difference in time, dt, is used for linear approximation for our graphs; therefore, users can adjust the precision of our model based on their needs. The ending time, t_end, sets a domain in which our modeling covers. Finally, the error tolerance or Newton tolerance, Newton tol, also deals with the difference between individual approximations and is responsible for the model’s precision compared to the actual data.

The SSPs, on the other hand, allow users to adjust different values ranging from the cyanobacteria cell radius to extracellular carbon dioxide concentration and then to the rate of reaction catalyzed by RuBisCO and carbonic anhydrase. Such open-ended design for modeling data demonstration encourages users to think deeper about this project, potentially sprouting new ideas for future directions that aim to further optimize the carbon concentrating mechanism, achieving our goal of mitigating the climate crisis.

In terms of results, the numerical values shown quantify the effect of engineered BCT1 transporters. The RuBisCO sink value reflects the final steady-state rate of carbon fixation inside the carboxysome. The corresponding CO2 fixed per cell converts this activity into an annualized metric. By scaling with η (the cellular volume fraction), the model estimates total fixation capacity per liter of culture. As η increases, the cells per liter increases, and thus the total annual CO2 fixation rises.

Beyond numerical values, the final simulation outputs combine both the transport and fixation stages of our carbon concentration model. The 3D plots of CO2 and HCO3- concentration gradients, C(r,t) and H(r,t), respectively, visualize how inorganic carbon species move and equilibrate within the system. Both plots show a gradual flattening of the concentration surface over time, indicating that the transport processes driven by BCT1 rapidly approach steady state. Approaching the red plateau region, the plot represents a high inorganic carbon concentration near equilibrium, while approaching the blue plateau region, the plot represents a lower inorganic carbon concentration.

Final Results

The wild-type strain has a fixation capacity of 7.104 tons of CO2 per liter per year, while the engineered CmpAB + CmpCD strain reached 8.031 tons per liter per year. These findings validate that our transporter engineering strategy produces a measurable improvement in net CO2 sequestration at the system level.

CO₂/HCO₃⁻ Model

Low Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.0008 0.0008 0.0008 0.0008
[RuBisCO] 0.027 0.027 0.027 0.027
carboxysome (cx) amount 2.6 2.6 2.6 2.6
[CO2] in flue gas 5 5 5 5
Medium Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.001 0.001 0.001 0.001
[RuBisCO] 0.062 0.062 0.062 0.062
carboxysome (cx) amount 5 5 5 5
[CO2] in flue gas 5 5 5 5
High Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.00145 0.00145 0.00145 0.00145
[RuBisCO] 0.109 0.109 0.109 0.109
carboxysome (cx) amount 9.5 9.5 9.5 9.5
[CO2] in flue gas 5 5 5 5

Please input the correct parameters below

The calculating process can take a few seconds

Parameters

Results

CO₂ fixed per cell: —

η (volume fraction)cells per litertons/year per liter

Future Directions:

Our current model demonstrated that under high CO2 concentrations, both the wild-type and genetically engineered S. elongatus PCC7942 statins reached RuBisCO saturation, thereby achieving Vmax in carboxylation within the carboxysome, which indicated that while our cmpAB and cmpCD fusion constructs enhanced HCO3- uptake and intracellular inorganic carbon flux, the overall fixation became limited by RuBisCO’s intrinsic conditions, and thus the cloned and wild-type strains exhibited similar carbon fixation efficiencies at certain light intensities and concentrations.

As a result, future work should focus on increasing RuBisCO abundance to push carbon fixation capacity beyond current limits. While our model indicated that HCO3- transport and conversion have been optimized, the total potential of the cyanobacteria remained constrained by the number of RuBisCO active sites. By elevating RuBisCO expression or improving its folding efficiency, we could expand the system’s overall carbon-capturing potential, particularly under medium to high light scenarios.

In addition, emerging research, such as studies on C2 photosynthesis and synthetic Calvin-Benson cycle enhancements, suggests that optimization of downstream carbon assimilation reactions could significantly increase overall fixation capacity (Batista-Silva et al., 2020; Lu et al., 2025). Incorporating such advancements into our system alongside our current high inorganic carbon uptake structure may yield synergistic improvements in carbon fixation efficiency.

References

Batista-Silva, W., da Fonseca-Pereira, P., Martins, A. O., Zsögön, A., Nunes-Nesi, A., & Araújo, W. L. (2020). Engineering improved photosynthesis in the era of synthetic biology. Plant Communications, 1(2), 100032. https://doi.org/10.1016/j.xplc.2020.100032

Lu, K.-J., Hsu, C.-W., Jane, W.-N., Peng, M.-H., Chou, Y.-W., Huang, P.-H., Yeh, K.-C., Wu, S.-H., & Liao, J. C. (2025). Dual-cycle co2 fixation enhances growth and lipid synthesis in arabidopsis thaliana. Science, 389(6765). https://doi.org/10.1126/science.adp3528

Rottet, S., Rourke, L. M., Pabuayon, I. C., Phua, S. Y., Yee, S., Weerasooriya, H. N., Wang, X., Mehra, H. S., Nguyen, N. D., Long, B. M., Moroney, J. V., & Price, G. D. (2024). Engineering the cyanobacterial ATP-driven BCT1 bicarbonate transporter for functional targeting to C3 plant chloroplasts. Journal of Experimental Botany, 75(16), 4926–4943. https://doi.org/10.1093/jxb/erae234

Protein Modeling

1. Introduction

The team designed two fusion genes to be incorporated into two individual vectors—CmpAB and CmpCD —for the expression of the CmpAB and CmpCD fusion proteins. The team hypothesized that the fusion proteins could enhance the speed of the CmpABCD complex formation and facilitate the rate of HCO3- docking onto CmpA, together improving the efficiency of HCO3- transport and thereby optimizing the cyanobacterial CCM. Additionally, in the team’s engineered cyanobacteria, the CmpB dimers (CmpB + CmpB) in the CmpAB fusion protein are capable of binding two CmpA substrate-binding proteins per dimer, potentially increasing the rate of HCO3- transport. To verify the team’s hypothesis and determine the feasibility of the team’s fusion protein design, it is integral to obtain the configuration (shape and structure) and orientation (location) of the CmpAB and CD fusion proteins to confirm their functionality in comparison with separated CmpA, B, C, D components. As a result, the team ran the Alphafold and DeepTMHMM software to predict the configuration and orientation (relative to the inner membrane) of the fusion proteins, respectively, and compared the results with the native (wild-type) CmpA, B, C, Dproteins.

2. CmpAB Fusion Protein

2.1 Configuration Confirmation: Alphafold results

2.1.1 Overview

According to the predicted Alphafold results, as shown above, besides the protruding alpha helix in the right-hand corner (as seen in Figure 2.), the CmpAB fusion protein and the assembly of the individual CmpA and B proteins have similar configurations overall, an inference further quantified by the pruned and global root mean square deviation (RMSD) values that are explained in the sections below.

Figure 1. The Alphafold prediction of the individual CmpA and CmpB dimer proteins assembled, with the purple portion representing CmpA and the red and green portions representing individual CmpBs making up the homodimer.

Figure 2. The Alphafold prediction of the CmpAB fusion protein (red portion), with an additional CmpB (purple portion) to produce a homodimer.

Figure 3

Figure 3. Figures 1 and 2 overlapped for a comprehensive structural comparison of the fusion and individual CmpA and CmpB compartments.

2.1.2 Defining Terms: Reading Root Mean Square Deviation (RMSD)

Below are the term definitions essential for reading root mean square deviations (RMSDs), values integral in Alphafold that allow the software to evaluate the validity of the team’s fusion proteins’ binding sites and functionality.

Root mean square deviation (RMSD) refers to the average distance (in Ångstroms, Å) between equivalent atoms (usually backbone or alpha carbon atoms) in two overlapping protein structures. A lower RMSD value suggests that the overlapping structures are more similar in configuration. Generally, an RMSD smaller than 2 is optimal to declare structural similarity between two proteins, which allows one to declare protein functionality and binding site activity.

In pruned atom pairs, the “atom pairs” refer to the corresponding atoms from the two protein structures in comparison after overlapping, whereas “pruning” refers to the process of eliminating atom pairs that are outliers (too far apart), which helps prevent inflated, inaccurate RMSD values. A low pruned RMSD value of two overlapping structures suggests a strong local structural similarity in core regions.

On the other hand, “unpruned (or full) atom pairs” indicate the RMSD of the protein structures before pruning, including all aligned (including the outliers) atom pairs. A high unpruned RMSD reveals significant global differences in overlapping protein structures, which could arise from terminus flexibility and loop positions (Kufareva & Abagyan, 2012). Comparing the RMSD values of pruned and unpruned atom pairs thereby tells one the extent of protein alignment.

2.1.3 RMSD and Visual Model Results

Table 1. Summarizes the RMSD results of the CmpAB fusion protein overlapping with individual CmpA and B protein compartments.

Table 1. RMSD results of the CmpAB fusion protein overlapping with individual CmpA and B protein compartments
Overlapping structure Pruned atom pairs RMSD (Å) Full (unpruned) atom pairs RMSD (Å)
CmpAB fusion + CmpA 378 0.400 422 5.503
CmpAB fusion + CmpB 226 0.677 278 14.723

According to Table 1, the CmpAB fusion protein, compared to the CmpA and B protein compartments individually through overlapping structures, demonstrates a low (<2Å) RMSD after pruning (0.400Å and 0.677Å, respectively), indicating that the core regions in both groups of overlapping proteins are highly similar and aligned in local structure. However, the unpruned RMSD of both groups is significantly higher than the pruned values, yielding a 5.503Å for the CmpA structural comparison and a 14.723Å for the CmpB structural comparison. Such an overall disparity, particularly for the CmpB structural comparison, could emerge from the dimeric nature of CmpB, where monomeric folds (which determine the core structure of each protein subunit) are maintained while interfacial (inter-subunit) orientation diverges (Garton et al., 2018). Fortunately, based on Alphafold’s visual protein predictions, the defining local frameworks of CmpB—the transmembrane pores integral for HCO3- transport, are maintained in the CmpAB fusion construct.

Additionally, it is essential to address a notable deviation, shown in the form of an outer spiral, of CmpB in the CmpAB fusion protein, as shown in Figure 2. Particularly, the difference between the global and pruned atom pair values is approximately 50 atom pairs for the CmpAB fusion + CmpB overlap (global subtracted by pruned), which corresponds to the protruding spirals of CmpBs in the CmpAB fusion (given CmpB’s dimeric nature, about 25 atom pair deviations are assigned to each monomeric unit of CmpB), explaining the atomic absence in pruned alignment. This further confirms the team’s knowledge of the limitations of global RMSDs, which consider flexible outer helices equally with the stable core structures, resulting in inflated values.

In conclusion, despite a diverging subunit orientation and outer helix deviations of the CmpB dimer in the team’s CmpAB fusion, the fundamental folds defining the individual CmpA and B compartments' functionalities are conserved (such as the pores for HCO3- transport); this suggests that the local protein geometries remain undisrupted and justifies the structural feasibility of the team’s CmpAB fusion protein design.

2.2 DeepTMHMM Orientation Confirmation

2.2.1 Overview

The DeepTMHMM results that predict transmembrane topology and signal peptides in protein sequences are expressed via the Most Likely Topology, Posterior Probability Plot, and Predicted Topologies Line panels in Figures 4, 5, and 6, as shown below. For the CmpAB fusion protein, CmpA remains located in the periplasm, while CmpB remains transmembrane (the inner membrane), as for individual CmpA and B components.

Detailed interpretations of the figures are written below in the sub-subsection 2.2.3.

2.2.2 Defining Terms: Reading DeepTMHMM Graphs

Below are the term definitions essential for reading DeepTMHMM graphs, quantitative representations that allow us to determine the orientations of the team’s fusion constructs and compare them with native CmpA, B, C, and D components.

Starting with the titles of the graphs, one significant aspect is the annotated protein type. If annotated as “globular,” the protein is globe-shaped (spherical) and soluble, generally performing functional roles as enzymes, cellular messengers, or transporters. An additional “+SP” label to the “globular” annotation suggests a signal peptide at the N-terminus of the globular protein, which is cleaved after directing the protein for secretion or targeting. The annotation “alpha TM” is short for α-helical transmembrane protein, which represents the α-helices, one of the major secondary protein structures, that span the membrane’s lipid bilayer (in the team’s case, the cyanobacterial inner membrane).

The Most Likely Topology Plot in the top panels shows the predicted arrangement, or location, of the protein of interest relative to a membrane (in the team’s case, the cyanobacterial inner membrane). The x-axis represents the input amino acid sequence (amino acid positions), and the y-axis represents the probability that a given amino acid is in a specific location, with various positions distinguished through colors (e.g., signal peptide [orange line], outside the membrane [blue line], transmembrane segments [red blocks], and inside the membrane [pink line]).

The Posterior Probability Plots in the central panels reveal DeepTMHMM’s confidence for each of its predicted amino acid residues being in a certain orientation. The x-axis represents the input amino acid sequence, and the y-axis represents the model’s confidence level of a specific input sequence’s location, expressed in probability form (0.0-1.0). Peaks (1.0) represent strong confidence levels.

The Predicted Topologies sections in the bottom panels first display the amino acid sequence of the protein of interest, with the letters representing abbreviations of each amino acid type. They then showcase the orientations of the amino acid residues in text form, coinciding with the information in the top panels.

Lastly, the “##gff-version” section provides feature annotations corresponding to the amino acid sequence coordinates, which include sequence lengths, number of predicted TMRs (transmembrane regions), and the range of sequences aligned to specific locations relative to the baseline membrane.

2.2.3 DeepTMHMM Results and Interpretations

As stated in the figure legend for Figure 6, the amino acid residues’ orientation in the CmpAB fusion generally aligns with the individual CmpA and B compartments.

In addition, as the yellow highlights in Figures 5 and 6 indicate, the DeepTMHMM graph for CmpB shows 6 predicted alpha TMs, while the CmpAB fusion shows 7. Also, the amino acid sequences for both structures seem to alternate in an overlapping manner, with the CmpB compartment beginning in the exterior (pink) and the CmpAB fusion beginning in the interior (blue). Such a difference could be explained by the team’s fusion design, where changes in the protein terminus affect topology. When fusion CmpB’s N-terminus is fused to fusion CmpA’s C-terminus, the position of CmpB’s first TM region relative to its N-terminus has been changed, potentially flipping which side of the cyanobacterial membrane B’s N-terminus resides. With the N-terminus of CmpB flipping in location, the membrane crossing parity, a determining factor of the number of helices required to position fusion CmpD’s C-terminus to the same side as the native CmpD’s C-terminus, could be influenced. As a result, the DeepTMHMM prediction has to, in the team’s case, insert an additional alpha-helix to fulfill the CmpB topology, justifying the discrepancy in the number of predicted TMs.

Figure 4 Figure 4 Figure 4

Figure 4. (1) The most likely topology plot of the individual CmpA protein. (2) The posterior probability plot for the individual CmpA protein. In this graph, the spike in the orange “signal” line indicates that the N-terminal region (the first few sequences) is a signal peptide. Following the signal peptide is a steady probability of 1.0 of the blue “outside” line, meaning that the remaining sequence for CmpA is predicted to be located outside of the cyanobacterial inner membrane, or in the periplasmic space. (3) The predicted topologies of the individual CmpA protein.

Figure 5 Figure 5 Figure 5

Figure 5. (1) The most likely topology plot of the individual CmpB protein. (2) The posterior probability plot for the individual CmpB protein. The red “membrane” regions represent the transmembrane alpha helices spanning the inner membrane. The pink “inside” line represents the inner “cytoplasmic” side of the inner membrane, and the blue “outside line represents the outer ”periplasmic” side. The protein sequences alternate between the membrane-interior (pink) → transmembrane (red) → and membrane-exterior (blue) orientations, further confirming the CmpB protein structure as having multiple transmembrane helices, which is characteristic of membrane channels. (3) The predicted topologies of the individual CmpB protein.

Figure 6 Figure 6 Figure 6

Figure 6. (1) The most likely topology plot of the CmpAB fusion protein. (2) The posterior probability plot for the CmpAB fusion protein. Despite the primary sequences of the CmpB protein changing from residing in the interior as an individual compartment to the exterior as an AB fusion protein, with a sequence of approximately 700 individual amino acids, the location of each region of the CmpAB fusion protein generally coincides with the individual CmpA and B proteins, as shown in Figures 4 and 5. (3) The predicted topologies of the CmpAB fusion protein.

2.3 SignalP 6.0: CmpA Signal Peptide Analysis

Using the bioinformatics tool SignalP 6.0, the team confirmed the presence of a signal peptide in the CmpA protein sequence of the team’s CmpAB fusion protein, including the exact type of the signal peptide. The identified signal peptide type, along with the software’s probability predictions, is shown in Figure 7.

Figure 7

Figure 7. CmpA Signal Peptide Analysis

The “predicted proteins table” summarizes the signal peptide type that the team’s CmpA protein most likely contains based on the amino acid sequence submitted. The SignalP 6.0 model predicted that the CmpA protein contains a TAT Lipoprotein signal peptide (TAT/SPII) with high confidence (Likelihood = 1). Therefore, the team could conclude that the CmpA protein follows the Twin-arginine translocation (TAT) pathway, is anchored in the cyanobacterial inner membrane as a lipoprotein, and gets processed by signal peptidase II (SPII). The cleavage site of the CmpA amino acid sequence, or where the TAT Lipoprotein signal peptide is cleaved off after the CmpA protein enters the periplasmic space, occurs between positions 34 and 35. The model demonstrates a fairly confident prediction in the cleavage site (>0.8).

The graph in the bottom section reveals the probability of the locations of different signal peptide types (as listed in the table above) along CmpA’s amino acid sequence. The x-axis refers to the amino acid sequence (1 → ~70), and the y-axis refers to the model’s prediction probability from 0 to 1. In this specific graph, the red curve representing the TAT/SPII signal peptide peaks to nearly 1.0 between the amino acid sequences of ~10-35, coinciding with the likelihood predicted in the table above. The vertical dashed line seen at ~34-35 represents the cleavage site. Despite other colored peaks (e.g., green and orange) present in the graph, they do not indicate the presence of their corresponding signal peptide types and merely reflect the locally similar features of the TAT/SPII signal peptide that resemble other pathways early in the sequence.

Figure 7 offers a well-rounded characterization of CmpA’s signal peptide, which gives rise to its function of the capture and delivery of HCO3- to the CmpB transporter complex. For CmpA to operate properly, it must be exported from the cyanobacterial cytoplasm into the periplasmic space and be positioned correctly when docking onto CmpD. The TAT pathway contains a unique feature—exporting folded proteins, which aligns with the fact that CmpA is a substrate-binding protein (SBP), a protein that binds to a substrate (HCO3- in the team’s case) and delivers it to a transporter, particularly ATP-binding cassette (ABC) transporters (Palmer & Berks, 2012). SBPs are often folded in the cytoplasm before export in the TAT pathway. With the TAT pathway, the CmpA compartment could function immediately as an HCO3- grabber after reaching the periplasmic space. CmpA is a lipoprotein signal protein, and lipoproteins are often integrated into the cell membrane via a lipid modification. Though protein docking simulations are required for further verification, the team predicted that prior to CmpA-HCO3- binding, the CmpA protein should possess considerable conformational freedom within the periplasmic space. After signal peptide cleavage by SPII, the mature CmpA protein begins with an N-terminal cysteine that is lipid-modified by three fatty acyl chains, two derived from the diacylglycerol group of the inner membrane phospholipids, enabling a loose lipid anchorage (hydrophobic tail) of the protein on the periplasmic side of the inner membrane (Wilson & Bernstein, 2016). This results in CmpA’s stabilized, efficient docking onto the CmpB complex without being overly rigid or freely soluble. Furthermore, the concept of effective molarity (EM) can be discussed, which is key to understanding how binding events (molecules being tethered together, in the team’s case, CmpA and CmpD) increase the likelihood of a molecular reaction, compared to two separate molecules. EM refers to a parameter that determines the efficiency of a range of supramolecular phenomena due to spatial proximity, which, in the team’s case, corresponds to multivalent ligand binding (Sun et al., 2013). The lipid anchorage of the CmpA protein increases the EM of CmpA around the inner membrane, thereby increasing its probability of interaction with the CmpB transporter complex.

In conclusion, an analysis of CmpA’s signal peptide type reveals how the SBP compartment complements the TAT lipoprotein signal peptide’s (TAT/SPII) characteristics, contributing to efficient HCO3- transport across the CmpB transmembrane transporter.

2.4 AutoDock: Docking Score for the CmpA Protein

Using AutoDock, a molecular modeling software that simulates protein-ligand docking, the team successfully estimated the binding affinity during the formation of the HCO3--CmpAB fusion complex. The binding affinity determines the favorability of the formation reaction, obtained by calculating the Gibbs free energy change (ΔG, in kcal/mol when calculated by AutoDock) using the reaction below:

Δ G=RTlnKd

where R is the gas constant 8.314 J/(mol*K), T the absolute temperature (K), and Kd the dissociation constant (M). A more negative ΔG (smaller Kd) suggests stronger HCO3--CmpAB fusion binding.

After simulating the docking between the CmpAB fusion protein and HCO3-, the team calculated a Kd of 4.492 μM for the HCO3--CmpAB fusion complex, compared to a Kd of 5 μM for the docking of HCO3- to individual CmpA and CmpB compartments. Since a smaller Kd indicates a more negative ΔG and thereby higher binding affinity, the team’s CmpAB fusion protein is predicted to demonstrate a stronger interaction with HCO3-, potentially improving the binding and transporting efficiency of HCO3- across the cyanobacterial inner membrane.

3. CmpCD Fusion Protein

3.1 Configuration Confirmation

3.1.1 Alphafold results

According to the predicted Alphafold results, as shown in Figures 8, 9, and 10, the CmpCD fusion protein has a configuration similar to the individual CmpC and D proteins assembled, with slight deviations. However, past literature has confirmed an increase in HCO3- uptake rate for the CmpCD fusion protein specifically. Based on the functional analysis conducted by Rottet et al. on multiple BCT1 mutants in E. Coli, where the HCO3- uptake rates for a subset of seven genetic constructs were measured, the CmpCD fusion gene exhibited the highest uptake rate, justifying the functionality of the team’s CmpCD fusion protein design (Rottet et al., 2024).

Figure 8. The Alphafold prediction of the CmpCD fusion protein (purple).

Figure 9

Figure 9. The Alphafold prediction of the CmpCD fusion protein (yellow) overlapped with the individual CmpC compartment (neon blue).

Figure 10

Figure 10. The Alphafold prediction of the CmpCD fusion protein (yellow) overlapped with the individual CmpD compartment (pink).

3.1.2 RMSD Results

Table 2 summarizes the pruned and unpruned RMSD results of the CmpCD fusion protein overlapping with individual CmpC and D protein compartments.

Table 2. Pruned and unpruned RMSD results of the CmpCDfusion protein overlapping with individual CmpC and D protein compartments
Overlapping structure Pruned atom pairs RMSD (Å) Full (unpruned) atom pairs RMSD (Å)
CmpCD fusion + CmpC 341 0.502 656 11.509
CmpCD fusion + CmpD 255 0.621 278 4.208

According to Table 2, similar to the CmpAB fusion protein RMSD analysis, the CmpCDfusion protein, compared to the CmpC and D protein compartments individually through overlapping structures, demonstrates a low ( <2Å) RMSD after pruning (0.502Å and 0.621Å, respectively), indicating that the core regions in both groups of overlapping proteins are highly similar and aligned in local structure. However, the unpruned RMSD of the CmpCDfusion + CmpC group is significantly higher than the pruned RMSD, yielding a value of 11.509Å when aligned with the individual CmpC structure. The homologous substrate-binding protein (SBP) on the CmpC compartment potentially explains this divergence. Past research on the SBP domains revealed their structural flexibility, particularly arising from peripheral loops and the extended C-terminal regions, leading to a diverging conformation that causes SBPs to be oriented differently in space, thereby inflating the unpruned RMSD despite a conserved catalytic fold and neat alignment of CmpC with the CmpA and B compartments (Bae et al., 2021). Nevertheless, since the fundamental folds are conserved, the local protein geometries remain undisrupted, justifying the structural feasibility of the team’s CmpAB fusion protein design.

3.2 DeepTMHMM Orientation Confirmation

According to the predicted DeepTMHMM results shown in Figures 11, 12, and 13, like the individual CmpC and CmpD proteins assembled in wild-type cyanobacteria, the CmpCDfusion protein remains in the cytoplasmic space.

Figure 11 Figure 11 Figure 11

Figure 11. (1) The most likely topology plot of the individual CmpC protein. (2) The posterior probability plot for the individual CmpC protein, indicating that the specific compartment resides on the inside of the cyanobacterial inner membrane. (3) The predicted topologies of the individual CmpC protein.

Figure 12 Figure 12 Figure 12

Figure 12. (1) The most likely topology plot of the individual CmpD protein. (2) The posterior probability plot for the individual CmpD protein, indicating that the specific compartment resides on the inside of the cyanobacterial inner membrane. (3) The predicted topologies of the individual CmpD protein.

Figure 13 Figure 13 Figure 13

Figure 13. (1) The most likely topology plot of the CmpCDfusion protein. (2) The posterior probability plot for the CmpCDfusion protein. With the pink “inside” line remaining at a steady probability of 1.0, the entire sequence for the CmpCDfusion protein is predicted to be located inside the cyanobacterial inner membrane (the cytosol). (3) The predicted topologies of the CmpCDfusion protein.

4. Conclusion

DeepTMHMM’s predictions of the CmpAB and CD fusion protein orientations generally coincide with the individual CmpA, B, C, and D proteins assembled. Alphafold’s predictions, along with pruned and global RMSD values extracted from overlapping individual CmpA, B, C, and D compartments with the CmpAB and CD fusion proteins, reveal structural similarity between the CmpAB fusion protein and the individual CmpA and B proteins assembled. Additional information on CmpA’s signal peptide analysis and a comparison of the binding affinities between the fusion and individual CmpA and CmpB compartments further demonstrate an enhancement in HCO3- transport efficacy with the team’s CmpAB fusion protein design. Although the CmpCDfusion protein configuration visibly deviates from the individual C and D structures, past research has shown that the CmpCDfusion gene mutant improves HCO3- uptake rates. In conclusion, the protein modeling results support the feasibility of the team’s fusion gene cloning strategy and fusion protein designs.

References

Bae, J. E., Kim, I. J., Xu, Y., & Nam, K. H. (2021). Structural flexibility of peripheral loops and extended C-terminal domain of short length substrate binding protein from Rhodothermus marinus. The Protein Journal, 40(2), 184–191.

Garton, M., MacKinnon, S. S., Malevanets, A., & Wodak, S. J. (2018). Interplay of self-association and conformational flexibility in regulating protein function. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1749), 20170190.

Kufareva, I., & Abagyan, R. (2012). Methods of protein structure comparison. In Homology Modeling: Methods and Protocols (pp. 231–257). Totowa, NJ: Humana Press.

Palmer, T., & Berks, B. C. (2012). The twin-arginine translocation (Tat) protein export pathway. Nature Reviews Microbiology, 10(7), 483–496.

Rottet, S., Rourke, L. M., Pabuayon, I. C. M., Phua, S. Y., Yee, S., Weerasooriya, H. N., ... & Price, G. D. (2024). Engineering the cyanobacterial ATP-driven BCT1 bicarbonate transporter for functional targeting to C3 plant chloroplasts. Journal of Experimental Botany, 75(16), 4926–4943.

Sun, H., Hunter, C. A., Navarro, C., & Turega, S. (2013). Relationship between chemical structure and supramolecular effective molarity for formation of intramolecular H-bonds. Journal of the American Chemical Society, 135(35), 13129–13141.

Wilson, M. M., & Bernstein, H. D. (2016). Surface-exposed lipoproteins: an emerging secretion phenomenon in Gram-negative bacteria. Trends in Microbiology, 24(3), 198–208.

Mathematical Model

CO₂/HCO₃⁻ Model

Low Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.0008 0.0008 0.0008 0.0008
[RuBisCO] 0.027 0.027 0.027 0.027
carboxysome (cx) amount 2.6 2.6 2.6 2.6
[CO2] in flue gas 5 5 5 5
Medium Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.001 0.001 0.001 0.001
[RuBisCO] 0.062 0.062 0.062 0.062
carboxysome (cx) amount 5 5 5 5
[CO2] in flue gas 5 5 5 5
High Light Wild Type CmpAB Fusion CmpCD Fusion CmpAB Fusion + CmpCD Fusion
BCT1 Vmax 5.50E-05 5.50E-05 5.50E-04 5.50E-04
[A] 0.0027 0.0028 0.0027 0.0028
Kd 5000 4492 5000 4492
[CA] 0.00145 0.00145 0.00145 0.00145
[RuBisCO] 0.109 0.109 0.109 0.109
carboxysome (cx) amount 9.5 9.5 9.5 9.5
[CO2] in flue gas 5 5 5 5

Please input the correct parameters below

The calculating process can take a few seconds

Parameters

Results

CO₂ fixed per cell: —

η (volume fraction)cells per litertons/year per liter