Dry Lab | iGEM Hamburg 2025

Project Introduction, Background and Goal

Team Hamburg set out to create a Nanobody (NB) binding the mushroom toxin alpha-Amanitin (amanitin) for intracellular delivery and rescue of the first line of intoxication victims, the hepatocytes or liver cells. In researching the topic our team was presented with multiple options to find a high affinity NB against a biomolecular target one of which was in silico design with neural networks. As Llama inoculation and screening of libraries were the alternatives, the option of choice was computational design using artificial intelligence for multiple reasons:

Circumventing animal experiments
Preserving resources by omitting high throughput in vitro selection
Using innovative technology
Pursuing a novel direction in research
Paving the way for future iGEM teams

It may be an oversimplification, but artificial intelligence (AI) ushered in a paradigm shift in life sciences and one scientific area where it caused the most uproar was structural biology. The combined effort of thousands of scientists over decades of research filled databases with structures of countless biological macromolecules using labor and time intensive methods like x-ray crystallography, NMR and electron microscopy.
These databases formed the foundation of the bioinformatical revolution that is AI as they provided the structural data to train neural networks on. These networks had tremendous success in predicting unknown protein structures solely from sequence as shown in CASP14, a structure competition from 2020. [1]
Alphafold from Google’s Deepmind was the pioneering neural network and its creative directors were honored with two thirds of the nobel prize in chemistry 2024. [2] Their algorithms are hailed as solving the protein folding problem and thus inspired many other groups to improve upon the concept by including other kinds of molecules or the amount of necessary computing power.
The third part of the Nobel Prize 2024 in Chemistry was awarded to David Baker, a protein engineer whose team used their own neural network (RosettaFold) in a backwards way. By inverting the prediction process they achieved de novo protein design for multiple purposes like protein, DNA and small molecule binders. [3] , [4]
Methods like this iteratively refine randomized structures into biologically meaningful proteins, offering unprecedented control over molecular engineering. The recognition of this research 2024 highlights the profound impact of AI on structural biology and the future potential for customizable protein design in medicine, biotechnology, and beyond.

Starting with limited knowledge on structural biology, we conducted thorough literature research and collected resources on protein design, both classical as well as AI based.
In this we came across a recent preprint from 2024 that described an AI model, specialized on antibody and NB structure generation for an epitope of choice. [5] It was released jointly by the Baker group and other Rosetta related scientists and introduced a complete pipeline beyond the structure generating model.
To use the model we reached out to Nanohelix [6], an AI service provider, that implemented multiple models into PyMol, a suite to display and analyze biomolecular structures. They graciously provided us with not only access, but also computing units for free. This had a major advantage as we did not have to search available server space and deploy the models ourselves.
The models chained together to build the pipeline called RFantibody were RFdiffusion, ProteinMPNN and RosettaFold2 (RF2). RFdiffusion generates the structure, complementary to the epitope. ProteinMPNN redesigns the initial output sequence to better match the structure and the complex with the epitope. RF2 validates the complex via prediction.

RFdiffusion generates the structure, complementary to the epitope.
ProteinMPNN redesigns the initial output sequence to better match the structure and the complex with the epitope.
RF2 validates the complex via prediction.

RFantibody

As input the RFantibody pipeline uses a single chain of a PDB entry of choice. Alpha-Amanitin is only present in a limited number of structures from crystallography and cryo-EM, in total stemming from just five experiments (PDB entries sorted by year: 1K83, 2VUM, 3CQZ, 3EXV, 8WAK-8WAZ). We compared the conformations of them with ChimeraX [7] [see Figure 1]. The structures showed little differences in conformation, so we chose an exemplary one (8WAZ) as RFantibody input to create our first NB structures. The structure selection in Nanohelix is quite intuitive and the output is separated into RFdiffusion, ProteinMPNN and RF2 specific files.

Figure 1: Available alpha-Amanitin structures from 1K83, 2VUM, 3CQZ and 8WAZ superimposed

RFantibody - Results

The final RF2 predictions were used to evaluate the generated structures. As intended the overall nanobody specific fold was kept intact and just the hyper-variable loops were modified to accommodate ligand binding as seen in [Figure 2].

Figure 2: NB structures from ten different designs superimposed without ligand

*Figure 3: NB structure outputs by RFantibody* ; Top row: 5 different designs with bound alpha-Amanitin in red; from input PDB 8WAZ; Bottom row: Close ups of ligand binding

Multiple things are directly apparent upon inspection of exemplary output complexes seen in [Figure 3]:

The binding site of alpha-Amanitin is consistently on lateral side of the NB
Only the largest of the hypervariable loops is involved in the binding mode
Alpha-Amanitin is never conserved in its intended form, see [Figure 4]
The ligand is often predicted in a chemically unfeasible position (clashing and overlapping with the NB structure)

Figure 4: Close-up of the ligand from one RFantibody output example ; incomplete alpha-amanitin structure, unfeasible constellation between the ligand and a tryptophan of the NB structure

In the output from each singular part of the pipeline the ligand was always represented incompletely. The modified amino acids (hydroxy-proline HYP, di-hydroxy-isoleucine ILX, hydroxy-tryptophane TLX) were missing completely. Additionally the bicyclic nature of the peptide was released to form a linear one.
To evaluate if the problem was simply an issue of depiction, we performed docking via Attracting cavities 2.0 on the SwissDock server [8] with alpha-Amanitin input as SMILES string and docking site either on the NB top with the hypervariable loops [Figure 5, left] or on the whole NB structure [Figure 5, right].

*Figure 5:* Left: Different binding poses of alpha-amanitin predicted by docking to the top of the NB, displayed all at once; Right: Binding poses predicted by docking to the whole NB structure

The docking did produce poses on the top of the NB in both instances but assessed binding free energy (minus-delta G) values of the poses did not exceed 7, which does indicate minimal interactions. H-bonds between NB and ligand were present but only in 37 of 60 poses.
With the results of RFantibody assessed in a superficial manner and outputs representing incomplete alpha-Amanitin we did not feel confident continuing with in silico validation of produced designs or generating additional ones.
For feedback on our approach this far and identifying the problem as well as solutions we reached out to de novo and in silico protein design experts and went back to literature research.
Clara Schöder (University of Leipzig, Medical Faculty, Institute for Drug Discovery; Germany) and Klara Kropivšek (University of Nova Gorica, Laboratory for Environmental and Life Sciences, Slowenia) agreed to talk to us about our approach and problem. What we found ourselves, being that RFantibody could not use our epitope input adequately and thus reducing it to a non-modified and linear peptide, was also Prof. Schöder’s main concern. The loss of epitope structure meant that the designed NB would probably not bind alpha-Amanitin properly, which is in line with our preliminary docking results. Both experts suggested and we also found in literature, that switching to an AI model that is capable of considering all atoms, not just protein residues was our way forward. For more information on implementing the Experts advice, see Human practices

RFantibody is based on three protein only models that omit modifications and small molecules in their design process. This is due to their training data, module setup and designated specificity. Initially our line of thought was that alpha-Amanitin being a peptide could function as a protein input in epitope selection of RFantibody. We did not anticipate the problems of the pipeline with post translational modifications (PTMs) that are highly represented in our target. For once multiple amino acids are augmented by hydroxylation as mentioned before as well as two cyclizations take place to form the final structure.
All of these PTMs were problematic for the RFantibody pipeline.
Our next step was to attempt workarounds in our pipeline while establishing a different model with all atom capabilities to reach our goal in the long term.
The salvage strategy did not lead to significantly improved output. For more detail we refer to our Engineering Page.

Model2 preparations

In search for a different AI based protein design model we were capable of identifying several candidates, among them RosettaFoldDiffusionAllAtom (RFdAA) [4], Chai2 [9], and Boltzdesign1 [10]. Boltzdesign1 is specifically trained on protein structures containing bound small molecules, DNA/RNA or involved PTMs. To generate a structure with the chosen ligand bound in a feasible way the neural network possesses a certain architecture, encoding the protein and the ligand in a different way and calculating atomic/residue distances. With a graphical abstraction of these, called the distogram, it is able to not only generate a structure, but do so in a more computationally efficient way than other models. [10]

Usable via Google Colab with a pre-existing notebook
Less resource demanding than other AI models with similar function
Better in silico evaluated outputs in comparison to RFdiffusionAA
Open Source under the MIT License
Unique ligand flexibility in the design process, mimicking “induced fit” [10]
Higher fold diversity compared to RFdiffusionAA [10]
Already deployed at the Maxwell High Performance Computing Cluster, we got access to by DESY (Deutsches Electron Synchrotron)

As an input choice we could use amanitin as a modified peptide or as a small molecule. The PTMs would be easy to represent in the first category, but two cyclizations would be problematic. We chose to represent alpha-Amanitin as a small molecule in the form of a SMILES string. We obtained multiple different strings from ChEMBL, Pubchem, KNApSAcK, ChEBI as well as wikipedia, compared them and found them to give identical 2D structures via a Smiles to structure tool [11] . We chose the ChEMBL string to go forward [12] , got acquainted with input parameters and outputs and started creating structures.

Boltzdesign1 - Results

In total we created 50 designs over the span of a month for selection and further evaluation. In the creation process we gradually found adjustments to the input parameters to better fit our desired outcome as well as to improve quality metrics. Overall the outputs were quite heterogeneous in the folds they presented while maintaining a length from 100 to 200 residues.
Initially the input values we chose were the default “small molecule binder” configurations provided by the model, but we had some concerns about the result we got from them.
At the beginning we saw a tendency for helical secondary structure only, quite superficial ligand binding and little interactions necessary for high affinity binding. Additionally the early structure outputs had a higher probability to fail in generating a confident protein, as seen by design #8 in [see Figure 6.]

*Figure 6:* Exemplary design outputs of Boltzdesign1; top row: Proteins with surface shown, bound alpha-Amanitin in dark red; bottom row: Fold representation of same designs, lengths of shown proteins: #20: 129; #8: 136; #33: 129; #15: 103

We employed certain changes to the configurations to to get a more consistently confident output with some desired features:

Increase likelihood of beta sheets to inhibit purely helical bundles
Drastically increase number of contacts between the binder and the ligand to improve interactions
Optimize protein-ligand contacts per binder position
Increasing the gradient phase of the model, that finds a “more realistic sequence representation” [10] to get outputs with more confidence
Increase final optimization steps called “semi_greedy_steps” to maximize final confidence

Evaluation of Results and Selection Process

Obtained designs were judged in consecutive phases:

Assessment by Boltzdesign1 output quality values
Structural evaluation of protein fold and protein-ligand interactions
Cross-Validation with structure prediction models Alphafold3 and Boltz2
Prediction of different protein characteristics with classical bioinformatic and AI based means

1. Assessment by Boltzdesign1 output quality values

Boltzdesign1 output consists of a structure of the protein-ligand complex and quality values. To get a first impression we tabled the metrics and presented them graphically (see Figure 7.).

*Figure 7:* Left: Internal quality metrics of Boltzdesign1; boxplots filled with dots represent a metric concerning the complex, except for iptm (always concerning interface); Right: Values of predicted distance error (pde) and pde for the protein/ligand interface (ipde) from generated Boltzdesign1 samples, log2 transformed and y-axis inverted

Most of the designs satisfied the more relaxed target values of plDDT (local “confidence”) of 0.7 and iptm (interface confidence) of 0.8, but only a handful of designs went above 0.9 for both metrics. High quality designs in that regard also showed exceptional values in the other categories. The pair-wise interface confidence (“pair_chain_iptm”) was most restrictive in the first calculation, where the ligand prediction rarely went above 0.7. Overall the ligand was always predicted with less confidence than the binder.
Additionally We also found complex_ipde to be a very good metric to signify good quality structures in contrast to bad designs. The predicted distance error for the interface between binder and ligand did only produce an acceptable value of under 1 for the designs that performed exceptional in all other quality measures. So we rationalized that “complex_ipde” could function as a singular measure to preliminarily identify good designs before checking confidence and other scores. To illustrate this we depicted the pde and ipde with the good quality cutoff indicated in Figure 7, left. Designs reaching the goal are listed in the box. These did well in all other metrics.

confidence_score: Overall assessment; the closer to 1, the better
ptm: Predicted template modeling; high confidence prediction above 0.8
iptm: Interface ptm; predicted relative positions of the subunits forming a complex; high confidence prediction above 0.8
complex_plddt: Per residue predicted local distance difference test; correctness of local distances, per residue confidence; goal above 0.7, ideally above 0.9
complex_iplddt: Interface plddt; goal above 0.7, ideally above 0.9
complex_pde: Predicted distance error; confidence in inter residue distances; low values desired; goal below 1
complex_ipde: Interface pde; low values desired; goal below 1
chains_ptm/iptm: Signifies if one chain (protein or ligand) is modelled better in respect to the other; goal above 0.8
RMSD: Averaged distance difference between two sets of atoms; goal is below 2Å

In addition to the quality values we did have a look at the distogram outputs that are presented in form of the iterative steps of Boltzdesign1 [see an example in Figure 8.] When comparing them we noticed that the designs displaying good quality scores had more protein-ligand interactions visible in the bottom left quadrant of the distograms. In Figure 9. two low quality designs are next to two high quality ones that show more spots in the mentioned quadrant. Visually inspecting the distogram could therefore be a means to evaluate outputs when looking for highly interacting binding proteins.

*Figure 8:* Exemplary distogram evolution of high quality design #40; epochs signify the design steps the model undertakes

*Figure 9:* A) Example distogram endpoints for four designs (low quality: #9, #35; high quality: #15, #40); B) Reference to evaluate distograms adapted from Boltzdesign1 preprint [10]; C) Final sequence generated by Boltzdesign1 for design #15 depicted as amino acid probability per residue position

Consulting the metrics and the distogram we were able to pre-select 14 good quality designs for further processing.

2. Structural evaluation of protein fold and protein-ligand interactions

The Selected designs were evaluated in ChimeraX [7] from the perspective of protein-ligand complex criteria such as hydrogen-bonds, clashes, cavities. Binding modalities differed quite extensively if compared to low quality designs, but the high quality ones did all have an acceptable number of interactions and no clashes between protein and ligand. Additionally the ligand was more secluded from solvent in most of the high quality designs.

*Figure 10:* Exemplary binding conformations of Boltzdesign1 output; Design #9 and #35 as examples of “low quality; Designs #15 and #40 as "high quality"; top row shows protein surface, bottom row shows contacting residues

Just like in binding to RNAPol II, the protein alpha-amanitin exerts its toxicity on, involvement of the peptide’s polar groups was high in the complexes. Binding proteins formed many hydrogen bonds with the hydroxyl groups of the modified amino acids and displayed hydrophobic interactions as well as pi-stacking [see Figure 11 and 12.].

Figure 11: Close-up on alpha-Amanitin in the structure of design #40, hydrogen bonds with binder displayed

Figure 12: Close-up on alpha-Amanitin in the structure of design #15, hydrogen bonds with binder residues displayed

Regarding the heterogeneity of our output folds, we calculated the pairwise template modelling (TM) score between the high quality designs and obtained a value of 0.34. The original Boltzdesign1 preprint estimated their overall TM-score at 0.36 and compared it with RFdiffusionAA with a TM-score of 0.46. [10]. This means that our designs are quite diverse in their folds. Boltzdesign1 produced more diverse designs in our use than both the preprint and RFdAA alike.

3. Cross-Validation with structure prediction models Alphafold3 and Boltz2

To verify the validity of our protein and protein-ligand complex we used Alphafold3 [13] and Boltz-2 [14], two models capable of including ligands in their protein structure predictions. The aim in this was to look for homogeneous predictions of protein structure and of ligand binding when compared to the Boltzdesign1 output [Figure 13. A) and B)] as well as within the predictions of Boltz2 [Figure 13. C)]. The more often a complex is predicted in the same constellation, the higher our confidence that alpha-amanitin would undergo the interactions with our protein in the way Boltzdesign1 initially predicted.

Consistency of structure prediction was high for the protein seen in low backbone-RMSD for most high quality designs. The ligand-RMSD did quite often exceed the accepted limit of 2Å except for three designs, one of which did instead have a very high backbone-RMSD. Designs #15 and #40 did also display low ligand- and backbone-RMSD in inter-model validation of Boltz2 predictions.

*Figure 13:* A) Protein cross-model validation: Structural comparison by backbone RMSD between predictions via AF3 and Boltz2 and Boltzdesign1 output; Boltzdesign RMSD represents internal evaluation of designs from the model itself; displayed values are the mean of five model outputs; distribution in the backbone RMSD was quite homogeneous, therefore only mean is shown; B) Ligand cross-model validation: Structural comparison of the ligand coordinates by RMSD between Boltzdesign1 output and Boltz2 prediction as cross-model validation; C) Boltz2 intra-model validation: Ligand and Backbone RMSD between all Boltz2 outputs for internal validation, displayed values are the mean of five model outputs. Goal is below 2Å, designs that fall below that in LigandRMSD: 15, 18, 25; missed closely: 30, 33, 40

Next we compared the per residue confidence plDDT of selected designs with each other [see Figure 14.]. Prediction quality was mostly stable over the length of the protein with a few exceptions of small local decreases (compare design #24 in Fig.14), but occasionally larger stretches had little confidence (compare design #37 in Fig.14). Designs that exhibited such drops as well as designs that had little local ligand confidence (also design #37 in Fig.14) were later dropped from the category of “high quality”.

*Figure 14:* Boltz2 per residue prediction confidence plDDT for four exemplary good quality designs, five models each, vertical line marks the transition from protein residues to ligand atoms, adapted from Neurosnap [20]

Design #15 did show consistency in predictions of all models, little deviation in structures and, more importantly, homogeneous ligand conformation in the protein binding pocket. To represent this consistency we superimposed Boltzdesign1 output with 20 re-prediction models from Boltz2 in [Figure 15.]

Figure 15: Boltz2 prediction of design #15 sequence with 20 models to look for homogeneity of ligand conformation and placement

4. Prediction of different protein characteristics with classical bioinformatic and AI based means

With the 14 designs assessed for validity we started to tackle their use as real world protein. For this we employed different AI models and programs to predict protein-properties of the designs. We compared results in search for the best candidate.
Boltz2 not only predicts the structure to a protein/ligand complex of the user's choice, but also the affinity. To back up the prediction, Boltz2 also assigns each one a probability score. All of the predicted affinities exceed the value of 8. Some designs do exceed nine but only few do combined with a consistent and high probability score. #15 does have the highest probability assigned to its affinity. Though not the highest, this gave us more confidence in the design [see Figure 16.].

*Figure 16:* Boltz2 affinity prediction and probability for all of the high quality designs

Next we predicted values of solubility and “usability”, a metric of how well the protein will hold itself in recombinant expression and protein purification. We used NetSolP1.0 [15] and SoDoPe [16], both sequence based predictive models. NetSolP does have two modes working with different protein language models (ESM1 b and ESM1 2). For comparison we used both [See Figure 17.]. Solubility was high in the ESM models, SoDoPe did assign reduced values to five of the designs. Usability was only high for design #15 and design #40.

*Figure 17:* Left: Prediction of protein usability by ESM1b and ESM1 2; Right: Prediction of protein solubility of high quality designs by ESM1b, ESM1 2 (boxplots)and SoDoPe (diamonds)

Another parameter we chose was the propensity to aggregate. This does overlap with solubility and usability, but the tool we chose, Aggrescan4D [17], was a structure based one. Our rationale was that a complete picture only forms with both sequence and structure integrated into the analysis. Aggrescan4D is not an AI model, it rather uses experimentally derived algorithms to project a calculated scale onto the protein's surface. For small globular proteins like our designs the developers advise users to consider the total score over the whole protein when comparing to others [see Figure 18., right]. Little differences in maximal score and average score were noticeable, but the total score did vary with a few high performers and design #24 being quite low.

*Figure 18:* Aggrescan4D server results for high quality designs; left: values for single residues and average; right: values for total structure

With these evaluations we were quite confident, that two of the 14 candidates, design #15 and #40, do potentially represent proteins with desired characteristics, predicted with high confidence.

LigandMPNN

To improve upon this basis we chose to do sequence redesign. With the goal of keeping valuable features like tight ligand binding and repeatability in prediction but upgrade our chances of successful expression.
For this we employed LigandMPNN [19], a neural network able to assign a sequence to an input protein structure containing a bound ligand.
We started with design #15 and did several LigandMPNN redesigns in search of high confidence scores.
We varied model parameters, sampling temperature and Gaussian noise, but omitted from biasing toward certain amino acids or excluding the ligand binding pocket. From each redesign run we selected the highest confidence sequences with the least sequence recovery to introduce novelty and potential for parametric change in regard to biochemical properties. We predicted a selection of 12 sequences with Boltz2 for structural comparison to initial Boltzdesign1 output. LigandMPNN was capable of producing exactly the same structure with maximal sequence recovery of 40 percent as the backbone-RMSD was always under 2Å, but the ligand never had comparable interactions with the protein [see Figure 19.]. Ligand-RMSD never fell below 5Å and predicted affinity was capped at 8kcal/mol, both quite unsatisfactory.

*Figure 19:* RMSD of backbone and ligand atoms between original Boltzdesign1 output and Boltz2 prediction of 12 sequence redesigns by ligandMPNN

Predicting solubility and usability with NetSolP1.0 like before but this time for 150 redesigned sequences of #15, we found increased solubility and usability in almost all of them [see Figure 20.]. SoDoPe prediction did show the same notion [see Figure 21.].

*Figure 20:* Prediction of solubility (left) and usability (right) of 150 LigandMPNN sequence redesigns; leftmost boxplot represents original Boltzdesing1 data for comparison

*Figure 21:* Prediction of flexibility, solubility and hydrophobicity changes of 150 LigandMPNN sequence redesigns in comparison to the original Boltzdesign output sequence

We concluded this sequence redesign with little progress and took with us the information that LigandMPNN was very well suited for increasing biophysical parameters although prone to losing tight ligand interactions. We wanted to see if the redesign information was valuable for us anyway, so we had a look at the output.
Comparing amino acid probabilities of LigandMPNN redesigns with contacts of alpha-amanitin in the original Boltzdesign structure we could see that many residues were fixed by the model [see Figure 22.] probably for their structural importance.
We tabled the amino acid positions, their probability to show a certain residue and compared them to the original sequence #15. Many direct contacts to alpha-amanitin, especially the hydrophobic ones, were unchanged. But in charged residues LigandMPNN often switched from positive to negative and vice versa (eg. Glu39Gln or Asp21Lys).
We found that LigandMPNN can be used to find amino acids important in ligand binding as well as in fold integrity. In comparison to the output structure of Boltzdesign1 and important contacts, residues can be identified that are essential in upholding the overall fold.

*Figure 22:* Amino acid probability distribution per residue of 150 LigandMPNN sequence redesigns for design #15; amino acids equal in all redesigns shown in yellow; adapted from Neurosnap

Amino acids conserved by LigandMPNN in comparison with original Boltzdesign1 sequence

Residue #	Identity in #15	Redesign	Probability
4	Phe	Phe	0.91
6	Pro	Pro	0.73
10	Asn	Asn	0.83
20	Leu	Ile	0.72
21	Asp	Lys	0.37
24	Trp	Ile	0.23
29	Gly	Gly	0.78
35	His	Pro	0.33
39	Glu	Gln	0.31
40	Ile	Leu or Ile	0.66
43	Phe	Phe or Tyr	0.69
44	Met	Leu	0.7
45	Asn	Asn	0.81
46	Lys	MIX	--
47	Ile	Leu	0.68
48	Leu	Leu	0.88
49	Asn	Asn	0.91
62	Phe	Phe or Tyr	0.89
65	Lys	MIX	--
66	Leu	Leu	0.4
67	Phe	Phe	0.8
69	His	MIX	--
70	Tyr	Leu	0.56
75	Asp	Trp	0.3
78	Met	Met	0.38

Table 1: Conserved residues according to LigandMPNN in comparison to original residue identity in design #15; contacts to alpha-amanitin identified in the output structure of Boltzdesign1 in yellow, other residues are probably important for structural integrity

Future aspirations

There are many more steps to validate an in silico designed protein structure. Time has limited us to the described methods, but we also endeavoured on docking alpha-amanitin to our structure with rigid body, flexible and AI based docking methods, in conjunction with Molecular Dynamics Simulations (MD). Until now ligand binding was validated with cross- and intra-model consistency, docking and MD would represent another angle of validation. Additionally we see the need to punctually improve upon design #15. Aggregation scan A4D revealed surface residues possibly in need of changing to inhibit aggregation [see Figure 23.]

*Figure 23:* Top left: Hydrophobic surface of design #15; Top right: concerning residues identified by A4D depicted; Bottom: Results of the Aggrescan4D on design #15 with concerning residues above the line

Concluding Remarks

We hope to have designed a feasible alpha-amanitin binding protein, evaluated by different means in an iterative process that is far from completed. As we submit the design as a part to the iGEM registry [25] , we want to point out that more evaluation needs to be done. We intend the design to be available to future iGEM teams as a starting point for further design steps or as a guide to their own design process.
With that in mind we have to make users aware of the caveats of AI based in silico protein design. The models like Boltzdesign1 and Boltz2 may build upon one another which may introduce confirmation bias. We tried to avoid this by only comparing outputs within a group of predictions. Also we did not see that all designs had perfectly re-predicted protein-ligand complexes, rather it was the exception.
Additionally, many of the models up to date did not go through the peer review process, and Boltzdesign1 especially, because of the recency and novelty, was not backed by a WetLab validation process yet.
Lastly we want to point out that many models are trained to predict or generate but lack a negative selection process to weed out bad outputs and hallucinations. In this notion, as we used Boltz2 we tried predicting a GFP alpha-amanitin complex to compare very low binding affinity values with our own designs. The predicted affinity of this highly improbable binder was put out as being higher than our own designs. With minimal interactions between GFP and the peptide, we concluded the predicted affinity to be a mistake in this instance. The outputs of neural networks have many practical uses, but come with a certain risk of being fictional. The evaluation and selection process, as we started it in our approach, becomes even more important when using models like Boltzdesign1, Boltz2 and even Alphafold3.

Compiled resources on protein design as well as outputs of Boltzdesign1 for design #15, of re-predictions with AF3 and Boltz2 and a table of all quality metrics can be found on our Gitlab under https://gitlab.igem.org/2025/software-tools/hamburg

Dry lab | iGEM Hamburg 2025

Achievements

Project Introduction, Background and Goal