Skip to main content

Model

Objective

Our primary modeling goal was to provide computational validation for our new improved part, a dCas9-Dam fusion protein. We aimed to demonstrate that our rationally designed part exhibits superior DNA-binding characteristics compared to the existing dCas9-Dam part in the iGEM registry (BBa_K4703002). The key engineering difference in our design is the implementation of a novel, extended flexible linker.

Linker Comparison

Original Linker Sequence: GGGS

Our Linker Sequence: GGSSRSSSSGGGGSGGGG

Our central hypothesis is that the increased length and flexibility of our linker allows the Dam methyltransferase domain greater freedom of movement, enabling the entire complex to form a more stable and extensive interface with its target DNA, the DNAP2 promoter sequence. To test this hypothesis, we employed a two-step computational pipeline: first, predicting the three-dimensional structures of both our complex and the original complex using AlphaFold 3, and second, performing a rigorous quantitative analysis of the protein-DNA interactions for both structures using the DNAproDB server.

πŸ”
Figure 1: The overall pipeline of the model part of our project.

Part 1: In Silico Structure Prediction with AlphaFold 3​

Introduction to AlphaFold 3​

To obtain high-quality structural models for our analysis, we utilized AlphaFold 3 [1], a state-of-the-art AI system developed by Google DeepMind and Isomorphic Labs. AlphaFold 3 represents a significant leap in computational biology, extending beyond predicting the structure of single proteins to accurately modeling the three-dimensional architecture of complex biomolecular assemblies, including proteins, DNA, RNA, and small molecule ligands.

The system functions by taking molecular sequences as input and processing them through a novel deep learning architecture. This architecture combines an MSA (Multiple Sequence Alignment) module for evolutionary information with a core "Pairformer" module that builds relationships between molecular components.

The process culminates in a diffusion model, which starts with a random cloud of atoms and iteratively refines their positions to generate a final, physically plausible 3D structure with atomic-level accuracy. This powerful predictive capability allows for the in silico construction of molecular structures that have not yet been determined experimentally, providing an invaluable tool for rational protein design and hypothesis testing.

πŸ”
Figure 2: A simplified schematic of the AlphaFold 3 architecture. The model processes input sequences and structural templates through its Pairformer and diffusion modules to generate an accurate 3D structure of a biomolecular complex.

AlphaFold 3 Prediction Results​

We submitted the sequences for our dCas9-Dam complex and the original BBa_K4703002 part, both in complex with the DNAP2 promoter DNA sequence, to the AlphaFold 3 server. The model generated 3D structures for both assemblies and provided confidence scores to assess the quality of the predictions. The two primary confidence metrics are the predicted Template-Modeling (pTM) score and the interface-predicted template modelling (ipTM) score.

  • pTM: Assesses the overall accuracy of the predicted structure of the entire complex.
  • ipTM: Specifically measures the confidence in the predicted accuracy of the interface between different molecules (in this case, the protein and the DNA).

The scores for the top-ranked models were as follows:

ComplexipTM ScorepTM Score
Our Complex0.230.66
Original Complex0.300.68

The pTM scores for both complexes are moderate, suggesting that the overall predicted fold is plausible. The ipTM scores are in a lower confidence range, which is not uncommon for complex protein-DNA interface predictions where large, flexible regions are involved.

However, since both structures were predicted using the identical state-of-the-art method, these models provide a valid basis for a direct comparative analysis of their interaction features. The goal is not to claim experimental accuracy but to assess the relative differences in binding potential that arise from our engineered linker.

πŸ”
Figure 3: AlphaFold 3 prediction of our dCas9-Dam complex interacting with the target DNA sequence. This structural model served as the input for the subsequent interaction analysis.
πŸ”
Figure 4: AlphaFold 3 prediction of the original dCas9-Dam complex interacting with the target DNA sequence. This structural model served as the input for the subsequent interaction analysis.

Part 2: Quantitative Interaction Analysis with DNAproDB​

Introduction to DNAproDB​

To quantify the physical interactions between the protein and DNA in our AlphaFold-predicted structures, we used DNAproDB [2], an automated database and web server designed for the comprehensive structural analysis of protein-DNA complexes. The DNAproDB pipeline takes a 3D structure file as input and calculates a wide array of biophysical and structural features that define the interface.

This allows for a detailed, quantitative comparison of binding interfaces.

Key metrics calculated by the server include:

Hydrogen Bond detection: These are highly specific interactions crucial for binding affinity and sequence recognition. DNAproDB uses the HBPLUS program to identify all hydrogen bonds between the protein and DNA with specific geometric cutoffs: a hydrogen-acceptor distance of < 3.0 Γ… and a donor-acceptor distance of < 3.5 Γ….

distance(D,A)<3.5 A˚distance(H,A)<3.0 A˚\begin{align} \text{distance}(D, A) &< 3.5\, \text{Γ…} \\ \text{distance}(H, A) &< 3.0\, \text{Γ…} \end{align}

Van der Waals (vdW) Interactions: These are non-specific attractive forces that contribute significantly to the overall stability of the complex. They are calculated based on a distance cutoff between atoms. All non-covalently bonded atom pairs with a distance less than the empirically chosen threshold of 3.9 Γ… are counted as a vdW contact.

Buried Solvent Accessible Surface Area (BASA): This metric quantifies the total surface area of the protein and DNA that becomes shielded from the surrounding water upon binding. A larger BASA indicates a more extensive and intimate interface, which is strongly correlated with higher binding affinity and stability.

SASA is calculated using the rolling-sphere definition (Lee & Richards algorithm) with a standard 1.4 Γ… water probe radius. It is calculated as the difference in the solvent-accessible surface area (SASA) of the components in their free versus bound states:

The total SASA for a single atom i is calculated by the given equation:

Ai=Riβ€‰Ξ΄βˆ‘s∈slices(2Ο€βˆ’Ξ³s)A_i = R_i \, \delta \sum_{s \in \text{slices}} \left( 2\pi - \gamma_s \right)

where:

  • RiR_i = The extended radius of the atom (the atom's van der Waals radius plus the solvent probe radius, typically 1.4 Γ…).
  • Ξ΄\delta = The thickness of each slice.
  • Ξ³s\gamma_s = The total angle of the atom's circular cross-section in a given slice s that is buried by neighboring atoms
BASA=(SASAprotein,free+SASADNA,free)βˆ’SASAcomplex,bound\text{BASA} = \left( \text{SASA}_{\text{protein,free}} + \text{SASA}_{\text{DNA,free}} \right) - \text{SASA}_{\text{complex,bound}}

Lee, B., & Richards, F. M. (1971). "The Interpretation of Protein Structures: Estimation of Static Accessibility".

The structural models were analyzed by DNAproDB, yielding a quantitative comparison of the key physical interactions stabilizing each protein-DNA complex.

Comparative Analysis and Results​

The structural models generated by AlphaFold 3 were submitted to the DNAproDB server for analysis. The resulting data provide a clear quantitative comparison of the key physical interactions that stabilize each protein-DNA complex.

Interaction MetricOur SystemOriginal SystemPercent Improvement
Nucleotide-Residue Interactions181111+63.1%
Weak Nuc-Res Interactions188+125.0%
Total Buried Surface Area (BASA) [Γ…Β²]4428.3783182.847+39.1%
Total Hydrogen Bonds2917+70.6%
Total Van der Waals Contacts507325+56.0%
Hydrophobicity Score (SAP)-0.667-0.39More Favorable
Secondary Structure Compositionhelix/strandirregularMore Ordered

Residue Contact Map Analysis​

The residue contact map provides a powerful 2D visualization of the protein-DNA interface, where the DNA is a central graph and surrounding nodes represent protein residues. Edges connecting them represent physical interactions. A direct visual comparison of the maps for our system and the original system provides a clear, intuitive confirmation of our quantitative findings.

Our System
πŸ”
Figure 5: Structure visualization of our dCas9-Dam system
πŸ”
Figure 6: Residue contact map for our dCas9-Dam system

Our System displays a dense and extensive web of interactions (edges) connecting a wide array of protein residues to the DNA backbone and bases. This visually corroborates the 63% increase in total nucleotide-residue interactions (181 vs. 111) and the significant boosts in hydrogen bonds (+70.6%) and van der Waals contacts (+56.0%) detailed in our quantitative analysis.

Original System
πŸ”
Figure 7: Structure visualization of the original dCas9-Dam system
πŸ”
Figure 8: Residue contact map for the original dCas9-Dam system

The Original System, in stark contrast, shows a comparatively sparse network of contacts. The fewer number of edges is a direct visual representation of the lower interaction counts across all metrics in the quantitative specifications sheet.

The most striking difference is the density of the interaction network.

Analysis and Inferences​

The results from the DNAproDB geometric analysis provide strong computational evidence that our redesigned construct forms a more extensive and stabilizing interface with the target DNA.

Our engineered dCas9-Dam complex outperforms the original part across every key metric of protein-DNA interaction:

Key Findings​

  • Vastly Increased Contact Network: Our complex forms 63% more total nucleotide-residue interactions and 125% more weak interactions. This indicates a much denser and more robust network of contacts holding the protein onto the DNA.[3]

  • Superior Interface Stability: The Total Buried Solvent Accessible Surface Area (BASA) is 39.1% larger in our complex. A larger BASA is a strong indicator of increased binding energy, as it reflects a greater reduction in unfavorable solvent exposure and an increase in favorable desolvation and nonpolar interactions.[3] Empirically, binding free energy (Ξ”G-binding) often correlates directly with the buried surface area. This is a critical finding, as a more extensive interface that excludes more water is a hallmark of a more stable and high-affinity interaction.[4]

  • Enhanced Binding Strength and Specificity: Our complex forms 70.6% more hydrogen bonds (29 vs. 17) and 56.0% more van der Waals contacts (507 vs. 325). The substantial rise in vdW contacts indicates improved steric complementarity and tighter packing at the interface, further contributing to overall stability.[5]

  • Favorable Secondary Structure: The more negative hydrophobicity score and the presence of ordered secondary structures (helix/strand) at the interface in our complex, compared to the "irregular" composition of the original, suggest that our flexible linker allows the protein domains to adopt a more stable and energetically favorable conformation upon binding.[6]

  • Reconciling Global vs. Local Confidence Metrics: Although the registry construct had a slightly higher global ipTM score from AlphaFold 3, the detailed geometric analysis of our design reveals a greater number of favorable local contacts. A likely reason for this is the increased flexibility of our longer linker. While this flexibility can reduce global interface confidence metrics like ipTM, it simultaneously allows the protein domains to sample more conformations and achieve one with superior local packing and a more stabilizing network of interactions. Therefore, the local, geometry-based quantification from DNAproDB is essential to complement the global AlphaFold scores and reveal the true potential of the interface.

Conclusion​

Our two-stage modeling pipeline provides compelling evidence that our newly designed dCas9-Dam basic part is superior to the existing part BBa_K4703002. The introduction of a longer, more flexible linker (GGSSRSSSSGGGGSGGGG) successfully enables the protein complex to form a significantly more extensive and stable interface with its target DNA.

The quantitative analysis demonstrates marked improvements in every critical aspect of binding, including a 39% increase in the buried surface area and a 71% increase in the number of hydrogen bonds. These computational results strongly support the conclusion that our engineered part will bind more tightly and effectively to its DNA target, making it a more reliable and efficient tool for applications requiring targeted DNA methylation.

References​

References
  1. Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w

  2. Raktim Mitra, Ari S Cohen, Jared M Sagendorf, Helen M Berman, Remo Rohs, DNAproDB: an updated database for the automated and interactive analysis of protein–DNA complexes, Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D396–D402. https://doi.org/10.1093/nar/gkae970

  3. Sathyapriya R, Vijayabaskar MS, Vishveshwara S. Insights into protein-DNA interactions through structure network analysis. PLoS Comput Biol. 2008;4(9):e1000170. doi:10.1371/journal.pcbi.1000170

  4. Kastritis PL, Bonvin AM. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J R Soc Interface. 2012;10(79):20120835. doi:10.1098/rsif.2012.0835

  5. Cho S, Swaminathan CP, Yang J, Kerzic MC, Guan R, Kieke MC, Kranz DM, Mariuzza RA, Sundberg EJ. Structural basis of affinity maturation and intramolecular cooperativity in a protein-protein interaction. Structure. 2005 Dec;13(12):1775-87. doi:10.1016/j.str.2005.08.015

  6. Almeida FCL, Sanches K, Pinheiro-Aguiar R, Almeida VS, Caruso IP. Protein Surface Interactions-Theoretical and Experimental Studies. Front Mol Biosci. 2021 Jul 9;8:706002. doi:10.3389/fmolb.2021.706002