We created 50 designs of alpha-amanitin binding proteins, ranked them by quality metrics and evaluated them computationally for multiple characteristics relevant for recombinant expression. We chose one top performing design to submit as a part in the iGEM registry and hope it will be useful for future teams as a starting point.
Team Hamburg set out to create a Nanobody (NB) binding the mushroom toxin alpha-Amanitin (amanitin) for intracellular delivery and rescue of the first line of intoxication victims, the hepatocytes or liver cells. In researching the topic our team was presented with multiple options to find a high affinity NB against a biomolecular target one of which was in silico design with neural networks. As Llama inoculation and screening of libraries were the alternatives, the option of choice was computational design using artificial intelligence for multiple reasons:
It may be an oversimplification, but artificial intelligence (AI) ushered in a paradigm shift in life sciences
and one scientific area where it caused the most uproar was structural biology.
The combined effort of thousands of scientists over decades of research filled databases with structures of
countless biological macromolecules using labor and time intensive methods like x-ray crystallography, NMR and
electron microscopy.
These databases formed the foundation of the bioinformatical revolution that is AI as they provided the
structural data to train neural networks on. These networks had tremendous success in predicting unknown
protein
structures solely from sequence as shown in CASP14, a structure competition from 2020. [1]
Alphafold from Google’s Deepmind was the pioneering neural network and its creative directors were honored
with
two thirds of the nobel prize in chemistry 2024. [2] Their algorithms are hailed as solving the protein folding
problem and thus inspired many other groups to improve upon the concept by including other kinds of molecules
or
the amount of necessary computing power.
The third part of the Nobel Prize 2024 in Chemistry was awarded to David Baker, a protein engineer whose team
used their own neural network (RosettaFold) in a backwards way. By inverting the prediction process they
achieved de novo protein design for multiple purposes like protein, DNA and small molecule binders. [3] , [4]
Methods like this iteratively refine randomized structures into biologically meaningful proteins, offering
unprecedented control over molecular engineering.
The recognition of this research 2024 highlights the profound impact of AI on structural biology and the
future
potential for customizable protein design in medicine, biotechnology, and beyond.
Starting with limited knowledge on structural biology, we conducted thorough literature research and collected
resources on protein design, both classical as well as AI based.
In this we came across a recent preprint from 2024 that described an AI model, specialized on antibody and NB
structure generation for an epitope of choice. [5] It
was released jointly by the Baker group and other
Rosetta related scientists and introduced a complete pipeline beyond the structure generating model.
To use the model we reached out to Nanohelix [6], an AI service provider, that implemented multiple models into PyMol, a suite to display and
analyze biomolecular structures. They graciously provided us with not only access, but also computing units
for
free. This had a major advantage as we did not have to search available server space and deploy the models
ourselves.
The models chained together to build the pipeline called RFantibody were RFdiffusion, ProteinMPNN and
RosettaFold2 (RF2).
RFdiffusion generates the structure, complementary to the epitope.
ProteinMPNN redesigns the initial output sequence to better match the structure and the complex with the
epitope.
RF2 validates the complex via prediction.
As input the RFantibody pipeline uses a single chain of a PDB entry of choice. Alpha-Amanitin is only present in a limited number of structures from crystallography and cryo-EM, in total stemming from just five experiments (PDB entries sorted by year: 1K83, 2VUM, 3CQZ, 3EXV, 8WAK-8WAZ). We compared the conformations of them with ChimeraX [7] [see Figure 1]. The structures showed little differences in conformation, so we chose an exemplary one (8WAZ) as RFantibody input to create our first NB structures. The structure selection in Nanohelix is quite intuitive and the output is separated into RFdiffusion, ProteinMPNN and RF2 specific files.
The final RF2 predictions were used to evaluate the generated structures. As intended the overall nanobody specific fold was kept intact and just the hyper-variable loops were modified to accommodate ligand binding as seen in [Figure 2].
Multiple things are directly apparent upon inspection of exemplary output complexes seen in [Figure 3]:
In the output from each singular part of the pipeline the ligand was always represented incompletely. The
modified amino acids (hydroxy-proline HYP, di-hydroxy-isoleucine ILX, hydroxy-tryptophane TLX) were missing
completely. Additionally the bicyclic nature of the peptide was released to form a linear one.
To evaluate if the problem was simply an issue of depiction, we performed docking via Attracting cavities
2.0
on the SwissDock server [8] with alpha-Amanitin
input as SMILES string and docking site either on the NB top
with the hypervariable loops [Figure 5, left] or on the whole NB structure [Figure 5, right].
The docking did produce poses on the top of the NB in both instances but assessed binding free energy
(minus-delta G) values of the poses did not exceed 7, which does indicate minimal interactions. H-bonds
between NB and ligand were present but only in 37 of 60 poses.
With the results of RFantibody assessed in a superficial manner and outputs representing incomplete
alpha-Amanitin we did not feel confident continuing with in silico validation of produced designs or
generating additional ones.
For feedback on our approach this far and identifying the problem as well as solutions we reached out to de
novo and in silico protein design experts and went back to literature research.
Clara Schöder (University of Leipzig, Medical Faculty, Institute for Drug Discovery; Germany) and Klara
Kropivšek (University of Nova Gorica, Laboratory for Environmental and Life Sciences, Slowenia) agreed to
talk
to us about our approach and problem. What we found ourselves, being that RFantibody could not use our
epitope
input adequately and thus reducing it to a non-modified and linear peptide, was also Prof. Schöder’s main
concern. The loss of epitope structure meant that the designed NB would probably not bind alpha-Amanitin
properly, which is in line with our preliminary docking results. Both experts suggested and we also found in
literature, that switching to an AI model that is capable of considering all atoms, not just protein
residues
was our way forward.
For more information on implementing the Experts advice, see Human practices
RFantibody is based on three protein only models that omit modifications and small molecules in their design
process. This is due to their training data, module setup and designated specificity. Initially our line of
thought was that alpha-Amanitin being a peptide could function as a protein input in epitope selection of
RFantibody. We did not anticipate the problems of the pipeline with post translational modifications (PTMs)
that are highly represented in our target. For once multiple amino acids are augmented by hydroxylation as
mentioned before as well as two cyclizations take place to form the final structure.
All of these PTMs were problematic for the RFantibody pipeline.
Our next step was to attempt workarounds in our pipeline while establishing a different model with all atom
capabilities to reach our goal in the long term.
The salvage strategy did not lead to significantly improved output. For more detail we refer to our Engineering
Page.
In search for a different AI based protein design model we were capable of identifying several candidates,
among
them RosettaFoldDiffusionAllAtom (RFdAA) [4],
Chai2 [9], and Boltzdesign1 [10].
Boltzdesign1 is specifically trained on protein structures containing bound small molecules, DNA/RNA or
involved
PTMs. To generate a structure with the chosen ligand bound in a feasible way the neural network possesses a
certain architecture, encoding the protein and the ligand in a different way and calculating atomic/residue
distances. With a graphical abstraction of these, called the distogram, it is able to not only generate a
structure, but do so in a more computationally efficient way than other models. [10]
As an input choice we could use amanitin as a modified peptide or as a small molecule. The PTMs would be easy to represent in the first category, but two cyclizations would be problematic. We chose to represent alpha-Amanitin as a small molecule in the form of a SMILES string. We obtained multiple different strings from ChEMBL, Pubchem, KNApSAcK, ChEBI as well as wikipedia, compared them and found them to give identical 2D structures via a Smiles to structure tool [11] . We chose the ChEMBL string to go forward [12] , got acquainted with input parameters and outputs and started creating structures.
In total we created 50 designs over the span of a month for selection and further evaluation. In the creation
process we gradually found adjustments to the input parameters to better fit our desired outcome as well as to
improve quality metrics. Overall the outputs were quite heterogeneous in the folds they presented while
maintaining a length from 100 to 200 residues.
Initially the input values we chose were the default “small molecule binder” configurations provided by the
model, but we had some concerns about the result we got from them.
At the beginning we saw a tendency for helical secondary structure only, quite superficial ligand binding and
little interactions necessary for high affinity binding. Additionally the early structure outputs had a higher
probability to fail in generating a confident protein, as seen by design #8 in [see Figure 6.]
We employed certain changes to the configurations to to get a more consistently confident output with some desired features:
Obtained designs were judged in consecutive phases:
Boltzdesign1 output consists of a structure of the protein-ligand complex and quality values. To get a first impression we tabled the metrics and presented them graphically (see Figure 7.).
Most of the designs satisfied the more relaxed target values of plDDT (local “confidence”) of 0.7 and iptm
(interface confidence) of 0.8, but only a handful of designs went above 0.9 for both metrics. High quality
designs in that regard also showed exceptional values in the other categories. The pair-wise interface
confidence (“pair_chain_iptm”) was most restrictive in the first calculation, where the ligand prediction
rarely went above 0.7. Overall the ligand was always predicted with less confidence than the binder.
Additionally We also found complex_ipde to be a very good metric to signify good quality structures in
contrast to bad designs. The predicted distance error for the interface between binder and ligand did only
produce an acceptable value of under 1 for the designs that performed exceptional in all other quality
measures. So we rationalized that “complex_ipde” could function as a singular measure to preliminarily
identify good designs before checking confidence and other scores. To illustrate this we depicted the pde
and
ipde with the good quality cutoff indicated in Figure 7, left. Designs reaching the goal are listed in the
box. These did well in all other metrics.
In addition to the quality values we did have a look at the distogram outputs that are presented in form of the iterative steps of Boltzdesign1 [see an example in Figure 8.] When comparing them we noticed that the designs displaying good quality scores had more protein-ligand interactions visible in the bottom left quadrant of the distograms. In Figure 9. two low quality designs are next to two high quality ones that show more spots in the mentioned quadrant. Visually inspecting the distogram could therefore be a means to evaluate outputs when looking for highly interacting binding proteins.
Consulting the metrics and the distogram we were able to pre-select 14 good quality designs for further processing.
The Selected designs were evaluated in ChimeraX [7] from the perspective of protein-ligand complex criteria such as hydrogen-bonds, clashes, cavities. Binding modalities differed quite extensively if compared to low quality designs, but the high quality ones did all have an acceptable number of interactions and no clashes between protein and ligand. Additionally the ligand was more secluded from solvent in most of the high quality designs.
Just like in binding to RNAPol II, the protein alpha-amanitin exerts its toxicity on, involvement of the peptide’s polar groups was high in the complexes. Binding proteins formed many hydrogen bonds with the hydroxyl groups of the modified amino acids and displayed hydrophobic interactions as well as pi-stacking [see Figure 11 and 12.].
Regarding the heterogeneity of our output folds, we calculated the pairwise template modelling (TM) score between the high quality designs and obtained a value of 0.34. The original Boltzdesign1 preprint estimated their overall TM-score at 0.36 and compared it with RFdiffusionAA with a TM-score of 0.46. [10]. This means that our designs are quite diverse in their folds. Boltzdesign1 produced more diverse designs in our use than both the preprint and RFdAA alike.
To verify the validity of our protein and protein-ligand complex we used Alphafold3 [13] and Boltz-2 [14], two models capable of including ligands in their protein structure predictions. The aim in this was to look for homogeneous predictions of protein structure and of ligand binding when compared to the Boltzdesign1 output [Figure 13. A) and B)] as well as within the predictions of Boltz2 [Figure 13. C)]. The more often a complex is predicted in the same constellation, the higher our confidence that alpha-amanitin would undergo the interactions with our protein in the way Boltzdesign1 initially predicted.
Consistency of structure prediction was high for the protein seen in low backbone-RMSD for most high quality designs. The ligand-RMSD did quite often exceed the accepted limit of 2Å except for three designs, one of which did instead have a very high backbone-RMSD. Designs #15 and #40 did also display low ligand- and backbone-RMSD in inter-model validation of Boltz2 predictions.
Next we compared the per residue confidence plDDT of selected designs with each other [see Figure 14.]. Prediction quality was mostly stable over the length of the protein with a few exceptions of small local decreases (compare design #24 in Fig.14), but occasionally larger stretches had little confidence (compare design #37 in Fig.14). Designs that exhibited such drops as well as designs that had little local ligand confidence (also design #37 in Fig.14) were later dropped from the category of “high quality”.
Design #15 did show consistency in predictions of all models, little deviation in structures and, more importantly, homogeneous ligand conformation in the protein binding pocket. To represent this consistency we superimposed Boltzdesign1 output with 20 re-prediction models from Boltz2 in [Figure 15.]
With the 14 designs assessed for validity we started to tackle their use as real world protein. For this we
employed different AI models and programs to predict protein-properties of the designs. We compared results
in
search for the best candidate.
Boltz2 not only predicts the structure to a protein/ligand complex of the user's choice, but also the
affinity. To back up the prediction, Boltz2 also assigns each one a probability score. All of the predicted
affinities exceed the value of 8. Some designs do exceed nine but only few do combined with a consistent and
high probability score. #15 does have the highest probability assigned to its affinity. Though not the
highest, this gave us more confidence in the design [see Figure 16.].
Next we predicted values of solubility and “usability”, a metric of how well the protein will hold itself in recombinant expression and protein purification. We used NetSolP1.0 [15] and SoDoPe [16], both sequence based predictive models. NetSolP does have two modes working with different protein language models (ESM1 b and ESM1 2). For comparison we used both [See Figure 17.]. Solubility was high in the ESM models, SoDoPe did assign reduced values to five of the designs. Usability was only high for design #15 and design #40.
Another parameter we chose was the propensity to aggregate. This does overlap with solubility and usability, but the tool we chose, Aggrescan4D [17], was a structure based one. Our rationale was that a complete picture only forms with both sequence and structure integrated into the analysis. Aggrescan4D is not an AI model, it rather uses experimentally derived algorithms to project a calculated scale onto the protein's surface. For small globular proteins like our designs the developers advise users to consider the total score over the whole protein when comparing to others [see Figure 18., right]. Little differences in maximal score and average score were noticeable, but the total score did vary with a few high performers and design #24 being quite low.
With these evaluations we were quite confident, that two of the 14 candidates, design #15 and #40, do potentially represent proteins with desired characteristics, predicted with high confidence.
To improve upon this basis we chose to do sequence redesign. With the goal of keeping valuable features like
tight ligand binding and repeatability in prediction but upgrade our chances of successful expression.
For this we employed LigandMPNN [19], a
neural network able to assign a sequence to an input protein structure containing a bound ligand.
We started with design #15 and did several LigandMPNN redesigns in search of high confidence scores.
We varied model parameters, sampling temperature and Gaussian noise, but omitted from biasing toward certain
amino acids or excluding the ligand binding pocket.
From each redesign run we selected the highest confidence sequences with the least sequence recovery to
introduce novelty and potential for parametric change in regard to biochemical properties. We predicted a
selection of 12 sequences with Boltz2 for structural comparison to initial Boltzdesign1 output.
LigandMPNN was capable of producing exactly the same structure with maximal sequence recovery of 40 percent as
the backbone-RMSD was always under 2Å, but the ligand never had comparable interactions with the protein [see
Figure 19.]. Ligand-RMSD never fell below 5Å and predicted affinity was capped at 8kcal/mol, both quite
unsatisfactory.
Predicting solubility and usability with NetSolP1.0 like before but this time for 150 redesigned sequences of #15, we found increased solubility and usability in almost all of them [see Figure 20.]. SoDoPe prediction did show the same notion [see Figure 21.].
We concluded this sequence redesign with little progress and took with us the information that LigandMPNN was
very well suited for increasing biophysical parameters although prone to losing tight ligand interactions. We
wanted to see if the redesign information was valuable for us anyway, so we had a look at the output.
Comparing amino acid probabilities of LigandMPNN redesigns with contacts of alpha-amanitin in the original
Boltzdesign structure we could see that many residues were fixed by the model [see Figure 22.] probably for
their structural importance.
We tabled the amino acid positions, their probability to show a certain residue and compared them to the
original sequence #15. Many direct contacts to alpha-amanitin, especially the hydrophobic ones, were
unchanged.
But in charged residues LigandMPNN often switched from positive to negative and vice versa (eg. Glu39Gln or
Asp21Lys).
We found that LigandMPNN can be used to find amino acids important in ligand binding as well as in fold
integrity. In comparison to the output structure of Boltzdesign1 and important contacts, residues can be
identified that are essential in upholding the overall fold.
| Residue # | Identity in #15 | Redesign | Probability |
|---|---|---|---|
| 4 | Phe | Phe | 0.91 |
| 6 | Pro | Pro | 0.73 |
| 10 | Asn | Asn | 0.83 |
| 20 | Leu | Ile | 0.72 |
| 21 | Asp | Lys | 0.37 |
| 24 | Trp | Ile | 0.23 |
| 29 | Gly | Gly | 0.78 |
| 35 | His | Pro | 0.33 |
| 39 | Glu | Gln | 0.31 |
| 40 | Ile | Leu or Ile | 0.66 |
| 43 | Phe | Phe or Tyr | 0.69 |
| 44 | Met | Leu | 0.7 |
| 45 | Asn | Asn | 0.81 |
| 46 | Lys | MIX | -- |
| 47 | Ile | Leu | 0.68 |
| 48 | Leu | Leu | 0.88 |
| 49 | Asn | Asn | 0.91 |
| 62 | Phe | Phe or Tyr | 0.89 |
| 65 | Lys | MIX | -- |
| 66 | Leu | Leu | 0.4 |
| 67 | Phe | Phe | 0.8 |
| 69 | His | MIX | -- |
| 70 | Tyr | Leu | 0.56 |
| 75 | Asp | Trp | 0.3 |
| 78 | Met | Met | 0.38 |
Table 1: Conserved residues according to LigandMPNN in comparison to original residue identity in design #15; contacts to alpha-amanitin identified in the output structure of Boltzdesign1 in yellow, other residues are probably important for structural integrity
There are many more steps to validate an in silico designed protein structure. Time has limited us to the described methods, but we also endeavoured on docking alpha-amanitin to our structure with rigid body, flexible and AI based docking methods, in conjunction with Molecular Dynamics Simulations (MD). Until now ligand binding was validated with cross- and intra-model consistency, docking and MD would represent another angle of validation. Additionally we see the need to punctually improve upon design #15. Aggregation scan A4D revealed surface residues possibly in need of changing to inhibit aggregation [see Figure 23.]
We hope to have designed a feasible alpha-amanitin binding protein, evaluated by different means in an
iterative process that is far from completed. As we submit the design as a part to the iGEM registry [25] , we want to point out that more
evaluation needs to be done. We intend the design to be available to future iGEM teams as a starting point
for
further design steps or as a guide to their own design process.
With that in mind we have to make users aware of the caveats of AI based in silico protein design. The
models
like Boltzdesign1 and Boltz2 may build upon one another which may introduce confirmation bias. We tried to
avoid this by only comparing outputs within a group of predictions. Also we did not see that all designs had
perfectly re-predicted protein-ligand complexes, rather it was the exception.
Additionally, many of the models up to date did not go through the peer review process, and Boltzdesign1
especially, because of the recency and novelty, was not backed by a WetLab validation process yet.
Lastly we want to point out that many models are trained to predict or generate but lack a negative
selection
process to weed out bad outputs and hallucinations. In this notion, as we used Boltz2 we tried predicting a
GFP alpha-amanitin complex to compare very low binding affinity values with our own designs. The predicted
affinity of this highly improbable binder was put out as being higher than our own designs. With minimal
interactions between GFP and the peptide, we concluded the predicted affinity to be a mistake in this
instance.
The outputs of neural networks have many practical uses, but come with a certain risk of being fictional.
The
evaluation and selection process, as we started it in our approach, becomes even more important when using
models like Boltzdesign1, Boltz2 and even Alphafold3.
Compiled resources on protein design as well as outputs of Boltzdesign1 for design #15, of re-predictions with AF3 and Boltz2 and a table of all quality metrics can be found on our Gitlab under https://gitlab.igem.org/2025/software-tools/hamburg