We created 50 designs of alpha-amanitin binding proteins, ranked them by quality metrics and evaluated them computationally for multiple characteristics relevant for recombinant expression. We chose one top performing design to submit as a part in the iGEM registry and hope it will be useful for future teams as a starting point.
Our wish for the computational side of our project was to find a nanobody against the peptide toxin
alpha-amanitin of the deathcap mushroom. We wanted to employ AI based structure generation, sequence redesign
and structure prediction to in silico obtain a verified high affinity nanobody.
We quickly had to realize that neural networks and the combination of those to AI based pipelines are yet to
reach the point of nanobody design for targets as small and complex as alpha-amanitin.
The bi-cyclic nature and posttranslational modifications of the peptide could not be used by the models as we
intended, so we had to shift the strategy.
We learned there were some AI models able to generate small molecule binding proteins. Though the protein
would finally not have the typical nanobody fold, we set out to generate a binding protein none the less,
oriented on nanobody characteristics like length, globular shape and solubility, to preserve the properties of
these small immunoglobulins.
The neural network we settled on was Boltzdesign1 [1], an inverted version of an open-source Alphafold3. Meaning, instead of structure prediction, Boltzdesign1 could generate a structure and sequence output for an input small molecule. Before generating structures we gathered information on the necessary inputs, parameters that can be adjusted, output quality metrics and finally formulated goals:
With these in mind we produced 50 protein structures with the model and conceived a plan to iteratively find the best design. The evaluation steps we took were:
Output of Boltzdesign1 is a structure of the generated protein-ligand complex [see Figure 1. for two examples]. Additionally there is a list of quality metrics to the output structure to evaluate and compare the designs.
There are multiple quality values associated with prediction confidence and predicted errors. [Figure 2.]
shows predicted template modeling score (ptm) and predictel local distance difference test score (plddt),
two metrics many structure prediction models use. ptm is a global score for the protein as a whole and plddt
a local, per residue score. Both do also have their interface specific counterparts in [Figure 2.].
What became apparent when depicting the quality metrics graphically, was that the protein itself was almost
always generated with a high confidence [Figure 2. purple line] as it reaches above the commonly accepted
goals of 0.8 for ptm and 0.7 for plddt. [2][3]
The interface predictions did vary a lot more and made it possible to identify outputs of higher
confidence.
Along with other confidence values like the predicted distance error score (dpe), a value for potential
distance deviations, we could select 14 designs for further evaluation.
Designs that received good values were examined in ChimeraX to assess the interactions between the protein and its ligand. We were looking for hydrogen bonding, pi-stacking, clashes and cavities. We were pleased that none of the chosen designs had major problems present in the binding constellation [see Figure 3 for four examples]. Alpha-amanitin contains many hydroxyl groups from posttranslational modifications that are necessary for binding its target, RNAPol II. In our proteins, the hydroxyl groups were often involved in hydrogen bonds. The tryptophane of amanitin also contributed to overall binding with pi-stacking a number of times. Lastly, no clashes between protein and ligand were observed.
The next step was to check for consistency in prediction with other AI structure prediction models (cross-model validation) as well as within the models used for re-prediction (intra-model validation). For this we used Alphafold3 [4] and Boltz2 [5]. [Figure 4.] shows design #40 with a high grade of homogeneity in predicting the ligands conformation and binding site. All alpha-amanitin structures were predicted to bind in the same way. To quantify this, we calculated the Root Mean Square Deviation (RMSD) between backbone atoms of the protein as well as ligand atoms [see Figure 5.]. Backbone RMSD was quite often below an acceptable limit of 2Å, only a few proteins went above. Ligand-RMSD both in cross- and intra-validation was less uniform: Only 3 designs did show high consistency with a low RMSD between ligand atoms (#15, #25, #40), within Boltz2 predictions 3 more came close. Interestingly, in intra-model validation there are more designs close to the goal value, but there are also designs with a highly variable ligand prediction. For example design #24 shows two distinct binding modes, one close to Boltzdesign1 output, one a little different. Overall design #15 and #40 did well in this validation assessment with good backbone- as well as ligand-RMSDs.
Following the structural assessment we used different methods to assess solubility, usability, propensity to
aggregate and affinity.
In solubility prediction with models ESM1 b and ESM1 2 there was little difference between designs [6]. With a third model, SoDoPe [7], the values were mostly high but fell
for designs #18, #24 and #28-30. Usability, the probability of successful recombinant expression and
purification, from both ESM models was more telling: only design #15 and design #40 did get a satisfactory
score.
Both solubility and usability were calculated by the used models from sequence input. We felt that another
method starting with structural input was fitting to complement our approach. With Aggrescan4D [8] we calculated the propensity of our
output structures to aggregate. Here most designs performed well with only #24 accumulating a score half as
high as the rest.
Finally we obtained a predicted affinity for our protein-ligand complex. Boltz2 gives out values of binding
free energy (-ΔG) to compare. The designs were all in the range of a good to strong binder with over 8
kcal/mol.
We did not take the calculated affinity to be of high accuracy or even confidence because in a comparative
run predicting a GFP-amanitin complex, the affinity was presented higher than of all our designs.
We did not use these predictions to sort out any of our 14 high confidence designs, but rather for comparing
them among one another.
With our selection process by quality metrics, our structural assessment, our cross-model as well as
intra-model validation and finally our feature prediction we identified design #15 as our top candidate.
What did we achieve when comparing our set goals with our evaluation of design #15?
We hope to have designed a feasible alpha-amanitin binding protein, evaluated by different means in an iterative process that is far from completed. As we submit the design as a part to the iGEM registry [9], we want to point out that more evaluation needs to be done. We intend the design to be available to future iGEM teams as a starting point for further design steps or as a guide to their own design process.
For a deeper analysis of the evaluation steps, more graphs and further information on our in silico design process we refer to our dedicated DryLab page under DryLab.
The part we submitted can be found in the registry under registry.igem.org/parts/bba-25owwae9