As the WetLab and DryLab parts of our project ran independently and in parallel to each other, we went through individual cycles of challenges and learnings. Some of these we present here. Additionally, we show the process of writing a children’s book about nanobodies, as we could also apply the engineering cycle to this project part.
We designed different plasmid constructs for an anti-GFP nanobody and GFP, carrying a secretion signal for E. coli as well as a nuclear localization signal and a 6x His-tag for purification. We expressed two of those constructs in BL21(DE3) E. coli and validated the expression through a Western Blot. Additionally, we broadened our methodological skills by learning different protein production, purification and isolation procedures such as growing and inducing E. coli liquid cultures, Ni-NTA purification, protein precipitation, sonification and periplasmic extraction, and analyses such as SDS-PAGE and Western Blot.
Design: Our aim in the WetLab was to set up a production pipeline for a recombinant nanobody construct
in
E. coli. The construct design was informed by extensive literature research. We chose a strong RBS
(BBa_B0030) for increased protein expression. The reading frame starts with a pelB sequence (derived from
BBa_K2114998) for periplasmic secretion, followed by a 5x aspartate linker for subsequent extracellular
secretion [1]. We then added a HRV 3C protease cleavage site to be able to remove the 5x aspartate linker. This
is
followed by the CDS for a nanobody against GFP (derived from BBa_K2114998), which we will use as a placeholder
for
our own nanobody. Additionally, we designed constructs containing the CDS for GFP as a control for the
production
of the nanobody. Right behind the CDS, we added a linker and a 3x NLS for nuclear localization [2]. After this,
we
added a second protease cleavage site, this time for the TEV protease for cutting off the 6x Histidine tag or
strep-tag II.
We named our constructs GFP_his and aGFP_his. For the expression, we used the widely used E. coli
strain
BL21(DE3)
which is a T7 expression strain and optimized the DNA sequence for the strain’s codon-usage.
Build: We ordered the constructs for GFP and the anti-GFP nanobody from our sponsor, Twist Bioscience, who synthesized the insert and cloned it into a pET blank vector lacking the RBS. Then we introduced the constructs into BL21(DE3) E. coli through heat shock transformation.
Test: We induced the production of our protein constructs with 0.5 mM of IPTG, incubated for 3h at 37 °C and tested different volumes of liquid cultures. As we designed our construct with a signal sequence and a linker for secretion into the periplasm and subsequently into the extracellular space, we expected our proteins to appear in the culture’s supernatant. Thus, we used protein precipitation with trichloracetic acid (TCA) and Ni-NTA purification to concentrate and isolate our protein from the supernatant in various repetitions of the experiment. We analyzed the protein production using SDS-PAGE and Coomassie Blue staining (Fig. 1-3). To compare native and recombinant protein production, we introduced a negative control in the form of an uninduced liquid culture. On top, we used a His-tagged protein from the lab that was expressed intracellularly under the control of the lac-operon and T7 promoter as a positive control for expression. The positive control is supposed to be expressed in the cytosol as it does not carry any secretion signals. Unfortunately, we were not able to detect any strong or distinguishable bands at the expected sizes of our protein constructs (20 kDa for nanobody and 35 kDa for GFP construct) with either of the methods. We were able to detect our positive control (a phospholipase with a molecular weight of approximately 79 kDa), which produced a strong band at around 70 kDa as expected. We were not able to detect any additional bands in the cell pellet or precipitated supernatant of our nanobody construct between induced and uninduced cultures (Fig. 1). Some gels showed a band at 35 kDa in the precipitated supernatant (Fig. 2) before and after induction, which would match the expected size for the GFP construct. We were not able to detect any bands in the different steps of the Ni-NTA purification of the supernatants.
Design: To rule out any issues with the RBS regarding translation initiation, we redesigned our initial constructs and created two new variations: in one of them, we reduced the spacing between RBS and CDS from seven basepairs to one basepair to match an existing iGEM part (BBa_K5299200) which had been successfully expressed in BL21(DE3) E. coli before (RBS1_GFP_his and RBS1_aGFP_his). For the second variation, we switched out the previous RBS (BBa_B0030) with the RBS commonly used in pET vectors (RBSp_GFP_his and RBSp_aGFP_his).
Additionally, we decided to use recognition sites for the restriction enzyme EcoRI we introduced up- and downstream of the nuclear localization signal (3x NLS). By removing the NLS, we wanted to rule out any issues regarding translation or folding derived from this component. We named the new constructs GFP_woNLS_his and aGFP_woNLS_his.
For our experimental design, we decided to try out a higher IPTG concentration of 1 mM for the induction of the protein production. We also increased the incubation time to overnight incubation and decreased incubation temperature to 25 °C to obtain higher amounts of recombinant protein. Additionally, we decided to sonify the cell pellets to better access the protein content.
Build: We ordered the different constructs with changed RBS (RBS1- and RBSp-constructs) from Twist Bioscience again and introduced them into BL21(DE3) E. coli through heat shock transformation. To create the constructs without 3x NLS, we used the restriction enzyme EcoRI and performed a restriction digestion. We isolated and purified the linearized plasmid through gel electrophoresis and gel extraction and ligated it. Sequencing analysis verified correct restriction and ligation (Fig. 4). These constructs were introduced into BL21(DE3) E. coli as well through heat shock transformation.
Test: Using our new experimental conditions, we repeated the protein production, precipitation of the supernatant and analysis by SDS-PAGE (Fig. 5 and 6). We again detected no distinguishable bands at the expected sizes for both RBS1-constructs apart from strong bands at the size of 35 kDa, that were found in precipitated supernatants of both protein constructs. For the constructs without NLS, we could not detect any strong bands at expected sizes (Fig. 6).
In a following repetition, we additionally performed Ni-NTA purification and analyzed the results via SDS-PAGE and Coomassie Blue staining. Again, we detected bands at 35 kDa in precipitated supernatant samples and no bands at all in the Ni-NTA purified supernatant samples (results not shown). To investigate if the bands at 35 kDa consisted of our recombinant protein constructs, we repeated the SDS-PAGE and performed a Western Blot to stain any His-tagged proteins in the samples. The Western Blot finally showed us bands at the expected sizes of 20 kDa for the nanobody and 35 kDa for the GFP in the cell pellet samples of the RBSp-constructs. Additionally, we saw one band of each of those sizes in the sample from RBS1_GFP_his (Fig. 7).
Learn: We discovered that the Western Blot was a sensitive and specific method to detect our protein constructs. Apparently, our proteins had not been secreted extracellularly as expected, as the precipitated supernatants did not show any signals. However, the secreted amounts might be too little to be detected. We also did research on native proteins that are secreted in BL21(DE3) E. coli and found that outer membrane proteins like OmpA or OmpF can be released as well [3]. These two proteins, with molecular weights of 37 kDa and 39 kDa, respectively, would match the band at 35 kDa that we saw repeatedly. After reaching out to Dr. Schäfer, regarding his experience with nanobodies, he explained that the observed band might also represent the lacI protein with an approximate molecular weight of 38.6 kDa. This protein is part of the lac-Operon present in BL21(DE3) E. coli as well as on our introduced plasmid and might be produced in greater amounts than other native proteins.
Design: Before we got the results of the Western Blot, we had assumed that we could not detect our protein because it was simply not there. We discussed problems occurring on the RNA level, such as secondary RNA structures inhibiting translation initiation or disrupting translation, as well as issues regarding protein folding and subsequent degradation. As the iGEM part consisting of a pelB and the nanobody against GFP (derived from BBa_K2114998) had been successfully expressed, we decided to go back to the basics and remove the 5x aspartate linker from our design, as well as the recognition sites for protease cleavage. This left us with constructs only consisting of a pelB, the CDS for the respective protein, and a 6x His-tag (part_bas_GFP_his and part_bas_aGFP_his). At this point in our project we realized that we had previously included restriction enzyme recognition sites in our designs that are not compatible with the iGEM BioBrick RFC[10] and the Type IIS RFC[1000] system. To make sure we adhere to these standards, we made sure to only include compatible sites in our new design for the reduced constructs, and also redesigned our original construct in a compatible way, so we could use it as a basis for future cloning steps (part_GFP_his and part_aGFP_his).
For the experimental design, we decided to perform a periplasmic extraction using osmotic shock after protein production, as our protein constructs all contain a pelB-sequence for import into the periplasm. This way, we wanted to validate if our protein was transported to the periplasm and stuck there, or if it was produced in the cytosol of the cells instead. We decided to analyze RBSp_aGFP_his again to validate the results of the previous Western Blot.
Build: Again, we ordered the new redesigned constructs from Twist Bioscience. This time, both inserts were designed without a RBS and cloned into a pET blank vector containing a RBS. We introduced the constructs into BL21(DE3) E. coli through heat shock transformation. However, we only obtained very few colonies for the nanobody constructs, and none for the GFP constructs.
Test: After we had prepared overnight cultures for our new nanobody constructs, RBSp_aGFP_his and the positive control, we discovered that the cultures prepared from cryostocks, RBSp_aGFP and the positive control, had not grown. We tried to grow those two again in 200 mL over the day, but were unsuccessful. We went on to perform sonification of the cell pellets for 100 mL of the remaining liquid cultures, respectively. The remaining 100 mL were used for periplasmic extraction through osmotic shock, respectively. We analyzed the sonified cell pellets as well as the cytosolic and periplasmic fraction after the periplasmic extraction using SDS-PAGE and Western Blot. However, we were not able to detect any bands, even when we increased the exposure time during imaging (Fig. 8).
Learn:
In our last week in the lab, we learned that sometimes not everything goes as planned. As we only obtained
very
few colonies after transformation using our competent BL21(DE3) E. coli, we suspect that they might
have
lost
competency over time. As we observed, competent cells should be prepared regularly to ensure usability. When
we
took out our cryostocks from the -80 °C freezer for the previous experiment, we noticed that they had started
to
thaw. Multiple rounds of thawing and freezing might have damaged the cells to a point where they have died.
Since we did not have any positive control for our Western Blot, we have different hypotheses on why we did
not
see any bands in our samples. It might be possible that the antibody used for staining was damaged through
degradation, as it was stored at 4 °C for one week and not at -20 °C. During the periplasmic extraction and
sonification of the cell pellet, buffer amounts ranging between 3 and 10 mL were added. It is possible that
our
protein was present in those supernatants, however the dilution might have been too high for detection.
Lastly, a third possibility might be that the new constructs were simply not expressed due to reasons already
mentioned in the design part of the 3. cycle. This is unlikely though, as the construct part_aGFP_his only
differs
from RBSp_aGFP_his in terms of restriction enzyme sites and a decreased distance by six base pairs between lac
operon and RBS in the region upstream of the coding sequence. However, we cannot rule out any issues arising
from
these differences. Out of time constraints, we did not analyze the protein production further. We conclude
that
our RBSp-constructs for the anti-GFP nanobody and for GFP are expressed in BL21(DE3) E. coli. However,
we
suspect
that the expression is not optimal, as only small amounts of protein could be detected in a Western Blot. More
thorough optimization with detailed analyses of expression conditions such as temperature, duration of
induction
with IPTG and IPTG concentration would be needed to increase protein production. Additionally, different kinds
of
extracellular secretion signals should be tested to facilitate secretion of the constructs into the medium.
The goal was initially to design an α-amanitin-binding nanobody (NB) for intracellular detoxification. We chose to design the nanobody in silico. We started out with very little knowledge on AI-assisted protein design. However, we were aware of the many possibilities because of the Nobel Prize in Chemistry 2024.
We created 50 binding proteins for α-amanitin with AI models for protein design and evaluated them using internal quality metrics and structural comparison of the output. We then selected a few high-quality designs for further in silico assessment. We used different structure-prediction models to validate and filter for the best designs. As an additional selection process, we predicted solubility, usability, propensity to aggregate and binding affinity of our designs with multiple models. We found 14 designs to be of good and three to six, depending on the assessment type, to be of exceptional quality. One of the best designs we registered as a part for future teams to take as a starting point for design, use our structure as a scaffold or do WetLab validation.
Design: Through a thorough literature research followed by a deep-dive into the Rosetta community's Youtube channel, a superficial understanding was attained and the Dunning-Kruger effect took off.
We investigated NB in silico design and found a promising preprint about RFantibody, an AI-based pipeline to design antibodies and NBs against epitopes of choice. To implement the software, we approached an AI service provider called Nanohelix and were granted free use of the established models.
We prepared the design process by conducting a database search for high-resolution structures of our epitope and were successful as α-amanitin was present in multiple structures on the PDB.
We assessed our epitope’s structure to see if there were different known conformations, but as the number of experiments was limited and α-amanitin was always in complex with RNAPol II in documented structures, the conformations did not differ significantly. At this point, we were confident that we could use the obtained PDB files as inputs for our AI pipeline of choice.
Build: We started to obtain multiple NB structures with different lengths of hypervariable loops and different input PDB structures to produce varied binding modalities in hope of finding a high affinity binder.
Learn: The results of the RFantibody model looked promising on the first glance but we quickly realized that multiple things were wrong:
We had the same problems with all of our designs. We analyzed the generated structures with ChimeraX and did docking on a few selected designs to see if the binding to α-amanitin was given although the output complex did not have an intact epitope present.
We were discouraged to keep going in this direction by our findings and sought help from experts and literature.
Design: Before we got the opportunity to talk to professional protein engineers and in silico experts, we tried to implement a salvage strategy for our model pipeline. First, we tried to run RFantibody with a modified input file. We deleted most post-translational modifications (PTMs) the peptide has in the PDB input file and converted the sequence to an unmodified string [IWGIGCNP]. This way, we hoped that a structurally compatible nanobody would be created by RFantibody that could later be modified to fit the actual α-amanitin. We planned post-generation processing of the structure with LigandMPNN [5]. This neural network is capable of generating adequate sequences based on an input structure in the context of a non-protein binder (small molecule, DNA/RNA, modification). Our hope was to generate a nanobody structure binding amantin without PTMs and assign a sequence to the output optimized to bind the native original α-amanitin.
Build: For this approach, we removed all hydroxyl-groups added to the peptide after translation, more precisely from hydroxy-proline (HYP), di-hydroxy-isoleucine (ILX) and hydroxy-tryptophane (TLX). We kept the bi-cyclic nature of the toxin to maintain the overall structure and conformation of the peptide and started generating binding structures to this input with RFantibody.
Test: The results were sobering. Instead of output designs we got nothing. Our modified input did not generate any structures. We assumed the reason for this to be in our file processing.
Learn: Simply removing atoms in a PDB file, be it using ChimeraX or manipulating the file in a text editor, might have had unforeseen consequences that made the file unusable for the model.
Additionally, we tried running LigandMPNN on our initial outputs but could not set it up correctly at this point. Adding a ligand instead of the broken α-amanitin turned out to be above our capabilities. Preliminary docking results from the outputs were not usable by LigandMPNN because the poses created by SwissDock’s Attracting Cavities 2.0 [6] did not have the adequate PDB coordinates we needed as an input. We ceased further salvage attempts in favour of a new approach.
Design: Our consultations with Clara Schöder and Klara Kropivsek (see here), two computational biologists, confirmed our concerns. The model pipeline we used was not suited to our needs because it was trained purely on protein data and not capable of considering such small and modified targets as our α-amanitin. As witnessed in the structures, the models that are part of the RFantibody pipeline both for structure generation and prediction, omitted all of the PTMs that our peptide had and represented just the few canonical amino acids as an epitope. This resulted in improbable binders.
As we identified the problem together with the experts, they gave us a solution to go forward: all-atom-capable AI models that put out small molecule binding proteins. Our goal was updated to design a protein binding α-amanitin without regarding the overall NB scaffold with the following characteristics:
With research into recent literature and preprints, we decided on a suitable AI model for our purposes: Boltzdesign1 is an all-atom protein design model, usable on Google Colab through a Jupyter notebook and more efficient than comparable models in regard to resource usage and output quality. [4]
It has multiple advantages in comparison to RFdiffusionAllAtom for example, a model previously used to design small molecule binders. (for more detailed information see here)
Build: Our epitope from PDB entries was unfit for our initial pipeline, so we changed our approach for our second model. α-Amanitin would be treated like a small molecule ligand instead of a highly modified peptide. The difference is in the model input, in case of a peptide or protein this would mean sequence information and coordinates obtained from PDB files. For a small molecule, we did not have to provide structural data but only chemical signifiers like a SMILES string that could be translated to a set of atoms and bonds by the AI model. With this processing, less initial information was put in that could obscure the inference and lead to suboptimal outputs.
Boltzdesign1 uses a distogram, a graphical representation of distances between the binding protein and the ligand, as well as a confidence module to iteratively find a structure. The process is separated into four phases that each have their unique purpose. [4]
Among many variables, the user can adjust different metrics to influence the extent of the search, the propensity of forming helices and the amount of interactions between binder and ligand.
At first we used default metrics for small molecule binder design provided in the Boltzdesign repository [source boltzdesign github] but in time we found adjustments to better fit our goal:
As the paper suggested, we refrained from using recycle steps to not introduce effects that could worsen our design and to save computing resources. Configurations used for our high-quality designs are deposited in our Gitlab. [7]
Test: We created 50 designs in the progress of optimizing the configuration which we compared to one another in different stages. Exemplary design outputs can be seen in Figure 10.
Most designs showed good internal values of plDDT above the goal threshold of 0.7, and ptm above 0.8, but only 14 of the designs had above-threshold metrics of complex and interface modelling. Even less designs had a predicted distance error of the interface (ipde) of below 1, signifying high-confidence ligand binding.
From the quality alone, we could select 14 designs for further evaluation where we found six designs to perform well in most assessments. Most importantly, the re-predicted ligands of three designs showed very homogeneous conformations and binding modes.
We finally settled on design #15 as our best candidate. In re-predictions, the protein structure was shown to be very homogeneous and the ligand displayed uniform binding conformations (see Fig. 11). Ligand RMSD is under 1 Å between Boltz2 and Boltzdesign (cross-model) and 1.3 Å between the Boltz2 predictions (intra-model), falling below the generally accepted thresholds of under 2 Å.
Design #15 also displays high predicted solubility by multiple AI models, high usability and low propensity to aggregate in comparison to the other designs. Predicted binding affinity is among the highest of all designs and structural evaluation of the binding modality reveals multiple hydrogen-bonds, no clashes and potential hydrophobic interactions.
Learn: What did we achieve when comparing our updated goal with its characteristics with our evaluated design #15?
Design: We started with a brainstorming meeting to develop the concept and story for our children's book. After a few productive hours, we had our first story draft, accompanied by a moodboard with images we found on the internet to convey the ideas for pictures that we had.
Build: After conceptualisation, the painting began. Mariia started sketching all the images digitally. Next, all the lines were carefully drawn so that we had a clean draft to be filled with colour.
Test: The pictures were reviewed by the team and non-scientific test readers. Additionally, we conducted an interview with Julia Offe on science communication (see here), target audiences, and possible ways to distribute the book.
Learn: We integrated the scientific and general feedback on pictures and story into our book, and reached out to various initiatives and companies asking for co-operations to share our book with the audience.
Design: We decided on a square-shaped format of the book, similar to a popular German children’s book series, the Pixi book. Additionally, we decided on a total number of 24 pages to keep the story short and precise.
Build: Once all the images were finished, they were combined with the text of the story. We continued to work on the general layout of the pages, moving pictures and text around to see what looks best.
Test: The digital version of the finished book was sent to Alejandro Rojas-Fernandez and Waldemar Schäfer to gather scientific feedback from nanobody experts. It was also reviewed again by our team, our PI and external test readers.
Learn: We made last visual changes, such as improving the distance of pictures and text from the edges. After several requests, we also created a German version of the text for non-English speakers. Afterwards, our book was ready for printing.