Model
Objective
Our primary modeling goal was to provide computational validation for our new improved part, a dCas9-Dam fusion protein. We aimed to demonstrate that our rationally designed part exhibits superior DNA-binding characteristics compared to the existing dCas9-Dam part in the iGEM registry (BBa_K4703002). The key engineering difference in our design is the implementation of a novel, extended flexible linker.
Linker Comparison
Original Linker Sequence: GGGS
Our Linker Sequence: GGSSRSSSSGGGGSGGGG
Our central hypothesis is that the increased length and flexibility of our linker allows the Dam methyltransferase domain greater freedom of movement, enabling the entire complex to form a more stable and extensive interface with its target DNA, the DNAP2 promoter sequence. To test this hypothesis, we employed a two-step computational pipeline: first, predicting the three-dimensional structures of both our complex and the original complex using AlphaFold 3, and second, performing a rigorous quantitative analysis of the protein-DNA interactions for both structures using the DNAproDB server.

Part 1: In Silico Structure Prediction with AlphaFold 3β
Introduction to AlphaFold 3β
To obtain high-quality structural models for our analysis, we utilized AlphaFold 3 [1], a state-of-the-art AI system developed by Google DeepMind and Isomorphic Labs. AlphaFold 3 represents a significant leap in computational biology, extending beyond predicting the structure of single proteins to accurately modeling the three-dimensional architecture of complex biomolecular assemblies, including proteins, DNA, RNA, and small molecule ligands.
The system functions by taking molecular sequences as input and processing them through a novel deep learning architecture. This architecture combines an MSA (Multiple Sequence Alignment) module for evolutionary information with a core "Pairformer" module that builds relationships between molecular components.
The process culminates in a diffusion model, which starts with a random cloud of atoms and iteratively refines their positions to generate a final, physically plausible 3D structure with atomic-level accuracy. This powerful predictive capability allows for the in silico construction of molecular structures that have not yet been determined experimentally, providing an invaluable tool for rational protein design and hypothesis testing.

AlphaFold 3 Prediction Resultsβ
We submitted the sequences for our dCas9-Dam complex and the original BBa_K4703002 part, both in complex with the DNAP2 promoter DNA sequence, to the AlphaFold 3 server. The model generated 3D structures for both assemblies and provided confidence scores to assess the quality of the predictions. The two primary confidence metrics are the predicted Template-Modeling (pTM) score and the interface-predicted template modelling (ipTM) score.
- pTM: Assesses the overall accuracy of the predicted structure of the entire complex.
- ipTM: Specifically measures the confidence in the predicted accuracy of the interface between different molecules (in this case, the protein and the DNA).
The scores for the top-ranked models were as follows:
| Complex | ipTM Score | pTM Score |
|---|---|---|
| Our Complex | 0.23 | 0.66 |
| Original Complex | 0.30 | 0.68 |
The pTM scores for both complexes are moderate, suggesting that the overall predicted fold is plausible. The ipTM scores are in a lower confidence range, which is not uncommon for complex protein-DNA interface predictions where large, flexible regions are involved.
However, since both structures were predicted using the identical state-of-the-art method, these models provide a valid basis for a direct comparative analysis of their interaction features. The goal is not to claim experimental accuracy but to assess the relative differences in binding potential that arise from our engineered linker.


Part 2: Quantitative Interaction Analysis with DNAproDBβ
Introduction to DNAproDBβ
To quantify the physical interactions between the protein and DNA in our AlphaFold-predicted structures, we used DNAproDB [2], an automated database and web server designed for the comprehensive structural analysis of protein-DNA complexes. The DNAproDB pipeline takes a 3D structure file as input and calculates a wide array of biophysical and structural features that define the interface.
This allows for a detailed, quantitative comparison of binding interfaces.
Key metrics calculated by the server include:
Hydrogen Bond detection: These are highly specific interactions crucial for binding affinity and sequence recognition. DNAproDB uses the HBPLUS program to identify all hydrogen bonds between the protein and DNA with specific geometric cutoffs: a hydrogen-acceptor distance of < 3.0 Γ and a donor-acceptor distance of < 3.5 Γ .
Van der Waals (vdW) Interactions: These are non-specific attractive forces that contribute significantly to the overall stability of the complex. They are calculated based on a distance cutoff between atoms. All non-covalently bonded atom pairs with a distance less than the empirically chosen threshold of 3.9 Γ are counted as a vdW contact.
Buried Solvent Accessible Surface Area (BASA): This metric quantifies the total surface area of the protein and DNA that becomes shielded from the surrounding water upon binding. A larger BASA indicates a more extensive and intimate interface, which is strongly correlated with higher binding affinity and stability.
SASA is calculated using the rolling-sphere definition (Lee & Richards algorithm) with a standard 1.4 Γ water probe radius. It is calculated as the difference in the solvent-accessible surface area (SASA) of the components in their free versus bound states:
The total SASA for a single atom i is calculated by the given equation: