-->

Molecular Docking

To Top

Molecular Docking

Model Description

To understand how our system behaves over time, we planned to implement mechanistic and stochastic modeling in the form of differential equations. The differential equations can be split into three parts: the toxin-antitoxin system, CcdR-cys binding to form a transcription factor, and the stochasticity associated with the binding of the transcription factor to the promoter on CcdB. The equations were built to represent CcdR octamerization and the binding of the octamer to L-cysteine with Hill-like approximations to capture saturation/cooperativity without detailed kinetics. The Hill-like approximations require an understanding of the binding affinity, specifically the Kd (equilibrium dissociation constant) value for each step of the process. However, these values are not found in literature due to the lack of prior characterization on CcdR. Furthermore, isolating a clear binding curve experimentally for CcdR to find these values poses challenges.

Molecular docking simulations allowed our team to efficiently identify binding affinity parameters required for the mechanistic model without adding additional assays to experimental design or making rough approximations based on literature. It also gave further insights into the binding mechanisms involved in forming the transcription factor to further characterize our system.

In this model, we utilized several different structural biology software tools with the ultimate goal of identifying Kd values beginning from CcdR dimerization to transcription factor formation. The main software used, and their functions are listed in the table below.

Software Description Purpose Reference
SWISS-MODEL Homology Modeling Predict the 3D structure of the CcdR monomer [1]
AlphaFold Deep Learning–Based Protein Structure Prediction Predict the 3D protein structure of different steps in the oligomerization process [2]
HADDOCK 2.4 Protein–Protein / Protein–Ligand Docking Software Predict how the CcdR octamer and L-cysteine ligand interact in 3D space [3]
PRODIGY Binding Affinity Prediction Tool Estimates dissociation constant from structural model/docking results [4]

Table 1. Software used to develop protein structures/models

Methods

CcdR is not well characterized in literature, but it is known to be an ortholog of Ybao/DecR, a feast/famine regulatory protein (FFRP) transcription factor from E.coli. They both share high sequence identity [5]. The FFRP structure is highly conserved with an N-terminal DNA binding domain, and a C-domain, which is involved in dimerization. Ligands typically bind at the interface between dimers [6]. The basis that CcdR is a FFRP-like structure and similar to the AsnC transcriptional regulator family is an assumption used throughout protein modeling.

The specific CcdR protein used in experimental design is mutated to increase affinity for dimerization, and thus increase affinity for forming a tetramer, octamer, and transcription factor. Given the novelty of the mutant protein, there are no currently established protein structures. Thus, the first step involved in carrying out docking simulations was creating a 3D structure for the protein using SWISS-MODEL and homology modeling. The mutant sequence was used and DecR/Ybao was used as the structural template. DecR was used as the template on the basis that it has the highest sequence identity of 86.93% with CcdR, and thus more accurately predicted local structures. The GMQE (Global Model Quality Estimate), which is a quality estimate combining properties from the target-template alignment and the template structure, was 0.96, indicating a high prediction accuracy.

Although a 3D structure of CcdR was developed with high accuracy in SWISS-MODEL, the validity of the structure was further tested using Ramachandram plots and running structural assessments in MolProbity [7].

Embedded Image

Figure 1. Ramachandran Plot illustrating the dihedral angles associated with amino acid residues in CcdR. 99.3% (150/151) of all residues were in favored (98%) regions. This figure was generated in MolProbity [7].

The Ramachandran plot in Figure 1 represents the dihedral angles (ϕ and ψ) of amino acid residues in protein structures. These angles describe the rotational orientation of the polypeptide backbone in the protein. Thus, it provides a 2D assessment of the stereochemical quality of the protein structure by determining which angle combinations are conformationally allowed. Each data point in the figure represents a combination of phi and psi angles occurring in a single amino acid. Alpha helical conformations are typically found in quadrant 3 and beta strand conformations in quadrant 2. From the plot above, it is seen that 99.3% of the residues were in favored regions. This further confirms the validity of the 3D structure of the protein. Less than 2% of residues in unfavorable conformations is commonly used as a quality standard.

An analysis on the 3D structure was run in MolProbity to provide statistics on protein geometry, peptide omegas, and additional validations. The first analysis provided results indicating there were no side chains in unrealistic conformations, no unusual peptide kinks and overall sound model geometry. However, there was one residue with a Cβ deviation greater than 0.25 Å, indicating the residue had sufficient deviation from the ideal position in terms of the backbone’s geometry. Based on these results, structural minimization was performed in ChimeraX to fix the geometry error [8]. Figure 2 shows the final results following a second MolProbity analysis.

Embedded Image

Figure 2. MolProbity structural assessment results after structural minimization of the CcdR monomer in Chimera X [7]

After structural minimization, all Cβ deviations were within the allowed range. There were still 7 bad angles that were not affected by structural minimization. The remaining indicators of protein geometry were within allowed ranges. Acknowledging that homology modeling is expected to contain some error, we moved forward with this 3D structure for future docking simulations.

After structural minimization, all Cβ deviations were within the allowed range. There were still 7 bad angles that were not affected by structural minimization. The remaining indicators of protein geometry were within allowed ranges. Acknowledging that homology modeling is expected to contain some error, we moved forward with this 3D structure for future docking simulations.

PRODIGY was used to determine the Kd values for predicted strucures [4]. PRODIGY incorporates a knowledge-based statistical model trained on experimental data. Based on the 3D structure, it identifies residues at the protein-protein interface to relate these features to experimental ΔG values from curated datasets of protein–protein complexes. The ΔG value is then converted to a Kd value. PRODIGY is limited in that it only provides estimates of Kd values based on the order of magnitude. However, for our mechanistic model, values that are in the correct order of magnitude are of greater importance than the exact values.

The training data used to create AlphaFold lacks information on a wide range of ligands. Thus, HADDOCK 2.4 was used to predict the binding between the CcdR octamer and the L-cysteine ligand. The binding mechanism of L-cysteine to CcdR is not well characterized. We assume that CcdR binding mechanisms follow the binding mechanisms of proteins in the AsnC family. In the AsnC protein family, eight asparagine ligands are known to bind to the protein. Two proteins are expected to bind at each dimerization interface [6]. Thus, we assume perfect binding of eight L-cysteine ligands is necessary to form the transcription factor.

HADDOCK 2.4 incorporates guided docking, requiring active and passive residues in the protein structure, to create more accurate predictions. Multiple Sequence Alignment (MSA) was performed in UniProt between the mutant CcdR and DecR protein sequences to identify conserved regions and residues [9]. The residues involved in forming a binding pocket between two dimers in DecR, including Y80, R98, D104, M114, N126, V134, T135, S136, S137, F138, and A139 [10]. These residues were all noted to be conserved in the MSA results, as shown in Figure 3, and were used as active residues in docking. Furthermore, the residues involved in forming an oligomer were also based on literature [10].

Embedded Image

Figure 3. Multiple sequence alignment between DecR (top row) and CcdR (bottom row) protein sequences

The resulting protein structure with L-cysteine docked to the protein was inputted in PRODIGY to determine the Kd value.

Results

Embedded Image

Figure 4 A-D. A. Monomer structure of CcdR protein developed in SWISS-Model after structural minimization in ChimeraX. B-D. Dimer, tetramer, and octamer structure determined by AlphaFold. The structures are colored by confidence scores. PlDDT scores higher than 90, 70-90, and less than 70 are indicated by dark blue, cyan, and yellow, respectively.

Figure 4 illustrates the 3D structure of each phase of oligomerization of CcdR. The binding patterns follow those of the FFRP transcription factor family of proteins. Specifically, in the AsnC family of proteins, the homodimer is held together by interactions between the anti-parallel beta sheets of the C-terminal domain [11]. The same binding pattern is observed in the dimer structure in Figure 4B. The dimer-dimer interface mainly consists of hydrophobic contacts, and each dimer pair forms a side of the binding pocket for the ligand. AsnC proteins also form an octamer with P4 symmetry, which is observed in the predicted octamer structure of CcdR as well.

Overall, all of the predicted structures in AlphaFold have confidence scores consistently higher than 90, indicating high accuracy. Furthermore, the interface predicted template modeling (ipTM) score for the dimer and octamer structures were higher than 0.8 as shown in Table 2. Values higher than 0.8 represent confident high-quality predictions.

Oligomerization Phase Predicted Kd Value (M) ipTM Score from AlphaFold
Dimerization 8.4e-18 0.89
Tetramerization 8.9e-09 0.6
Octamerization 7.9e-16 0.88
CcdR Octamer + L-cysteine 7.2e-15 N/A

Table 2. Kd values predicted for each phase of oligomerization by PRODIGY

The Kd values were directly inputted into the mechanistic model to represent the binding affinity and the likelihood of the formation of the final octamer.

Embedded Image

Figure 5. Four binding channels into the L-cysteine binding pocket adapted from Zhou et. Al, 2024 [10]. Two L-cysteine ligands bind to each dimerization interface

Embedded Image

Figure 6. In silico docking of L-cysteine bound to the dimerization interface. Residues involved in binding pocket formation and binding to L-cysteine are labeled with residue name and number. (2nd CcdR monomer was removed for visibility).

Given the hydrophobic nature of L-cysteine, it is reasonable that the ligand preferentially binds within the hydrophobic dimer–dimer interface.

Summary

The molecular docking simulations successfully predicted the Kd values for the CcdR-L-cysteine complex without challenging experimental assays. The results were directly used in the mechanistic model differential equations. In addition to parameters, the model revealed the binding mechanisms for the mutant CcdR oligomerization and the CcdR-Cys transcription factor complex, valuable for characterizing the novel toxin-antitoxin system composite part. The 3D structure of the CcdR monomer was built using homology modeling and validated with a Ramachandran plot showing 99.3% of residues in favored regions. AlphaFold confirmed the binding patterns for the mutant CcdR followed the FFRP family with with high confidence scores (ipTM>0.88 for the dimer and octamer). HADDOCK 2.4 was used to dock 8 L-cysteine ligands to the CcdR octamer, illustrating binding to the hydrophobic dimer-dimer interface.

References