Explain your model's assumptions, data, parameters, and results in a way that anyone could understand.
The human gastrointestinal tract is a complex system with numerous biochemical components that is typically studied and understood through wet-lab and clinical experimentation. The process of understanding our experimental environment and the effects of a novel therapeutic can take multiple months to years with solely wet-lab work. To remedy this time and resource constraint, our modeling team developed numerous mathematical, computational, and 3D models to understand our peptide-E. coli system in various experimental, clinical, and therapeutic conditions.
The basis of our project was developing a peptide which could bind to polystyrene (PS) microplastics and would then be expressed on the external membrane of an E. coli which could be taken as an oral PS microplastic removal therapeutic. To model the structure, binding affinity, physiochemical properties, and binding dynamics of our PS-binding peptide and binding units, we utilized structure prediction software, molecular docking simulations, molecular dynamics simulations, math modeling, and spatial modeling.
On the E. coli scale, we were interested in modeling the to-scale sizes of an average micro/nanoplastic and our peptide-E. coli system which spatial modeling was used to do so. We also wanted to understand the potential influence of proximal peptides on the E. coli membrane surface affecting each others' binding to a PS microsphere which we used binding interface Markov State Machine math model for. On the organ system scale, we wanted to understand the effectiveness of our peptide-E. coli system as it passes thorough the human gastrointestinal tract when taken as an oral therapeutic. We used an Ordinary Differential Equation (ODE) as the base structure of the model describing mathematical modeling methods to predict the concentrations changes of polystyrene, our peptide-E. coli system, and the complex of bound polystyrene and our system as it passes through the gastrointestinal tract.
To gain a better understanding of and generate a 3D PDB model of our units, our team used AlphaFold 3 since no experimental structures are available (DeepMind and Isomorphic Labs[1], 2024). These predictions were then used in downstream molecular docking and molecular dynamics models as well.
Since pTM values that are usually used to quantify the quality of AlphaFold’s folding do not hold meaning for short peptides (AlphaFold Server[2], 2025a), we assessed folding quality using AlphaFold’s local pLDDT of the binding helix (AlphaFold Server[3], 2025b; DeepMind and Isomorphic Labs[1], 2024). Each section of the predicted structure is colored according to the pLDDT confidence of the model, as explained in the heatmap color legend. The spin animation of the default cartoon with the molecular surface in green is shown below.
To create the Binding Units our team obtained the plastic binding peptide sequences from the literature (Alsheri[23], 2025; Dhoriyani[24], 2025). A linker (BBa_K4586019) and a 6‑His tag (BBa_K128005) were then added. These peptides as well as their composite parts were uploaded to the parts registry and the corresponding parts numbers are available in the table below as well as the parts page on our wiki.
Plasmid | Part # | Name | Source | Class | Length (aa) | Sequence |
---|---|---|---|---|---|---|
pFSUIGEM25-1 | BBa_25OGTX4L | PS Unit A | Alsheri[23], 2025 | With linker | 24 | MHHHHHHGGGGSWWMRHMFAWRIF |
pFSUIGEM25-12 | BBa_25G2X4SD | PS Unit B | Alsheri[23], 2025 | With linker | 24 | MHHHHHHGGGGSFWWRTIVWRHIR |
pFSUIGEM25-3 | BBa_25OUAT12 | PS Unit C | Alsheri[23], 2025 | With linker | 24 | MHHHHHHGGGGSYFIWWWRMFFFR |
pFSUIGEM25-4 | BBa_259WMV9Q | PS Unit D | Dhoriyani[24], 2025 | With linker | 24 | MHHHHHHGGGGSRRLWWRWVFRNR |
pFSUIGEM25-5 | BBa_25A4JLUO | PS Unit E | Dhoriyani[24], 2025 | With linker | 24 | MHHHHHHGGGGSERNMRRMLWKWR |
pFSUIGEM25-6 | BBa_256BX3HN | PS Unit F | Alsheri[23], 2025 | No linker | 19 | MHHHHHHWWMRHMFAWRIF |
pFSUIGEM25-7 | BBa_25AO4Q8S | PS Unit G | Alsheri[23], 2025 | No linker | 19 | MHHHHHHFWWRTIVWRHIR |
pFSUIGEM25-8 | BBa_25XQA0OH | PS Unit H | Alsheri[23], 2025 | No linker | 19 | MHHHHHHYFIWWWRMFFFR |
pFSUIGEM25-9 | BBa_2524CGR9 | PS Unit I | Dhoriyani[24], 2025 | No linker | 19 | MHHHHHHRRLWWRWVFRNR |
pFSUIGEM25-10 | BBa_254IF9TX | PS Unit J | Dhoriyani[24], 2025 | No linker | 19 | MHHHHHHERNMRRMLWKWR |
Plasmid | Part # | Name | Source | Class | Length (aa) | Sequence |
---|---|---|---|---|---|---|
– | — | PET Unit A | Dhoriyani[24], 2025 | With linker | 24 | MHHHHHHGGGGSEWQRHYHLWWFR |
pFSUIGEM25-2 | BBa_25MRI99K | PET Unit B | Dhoriyani[24], 2025 | With linker | 24 | MHHHHHHGGGGSWHRMQWAHIELW |
– | — | PET Unit C | Alsheri[23], 2025 | With linker | 24 | MHHHHHHGGGGSFHVWWINIFWFF |
Plasmids for PET Unit A and PET Unit C were not received from Twist Bioscience due to synthesis issues on their end.
To quickly rank and screen our constructs by binding affinity to various plastics, we created a high-throughput AutoDock Vina‑based molecular docking screening pipeline (Trott and Olson[5], 2010). We designed it for SLURM batching HPC servers and called the software Sito (Sito translates to “sieve” in Ukrainian).
Our model is limited by the lack of peptide flexibility as well as the plastics being small monomers instead of plastic surfaces because of underlying AutoDock Vina limitations which is why the ∆G produced by docking should be treated as a relative score.
Despite these limitations, molecular docking is widely used in research and drug design to screen constructs based on binding, as other methodologies can be too time consuming and costly (Blanes‑Mira et al.[35], 2022).
That is why we believe this model fits our use case and is a good proxy to assess the efficacy of binding of our peptides to plastics as an initial virtual screening pass before moving on to studying the constructs with experiments and more complex models. You can learn more about using our software on our contribution and software pages (FSU iGEM Team[6], 2025; FSU iGEM Team[7], 2025).
SLURM is a job scheduler used on High Performance Computing (HPC) clusters to queue and run many tasks in parallel across hundreds or thousands of CPU cores. Universities typically provide HPC access to researchers, faculty labs, and the students who work with them. iGEM teams often qualify for access through their host institutions, enabling fair share use of cluster resources for compute intensive tasks like large scale docking screens and molecular simulations. In practice, you submit a batch of independent jobs to SLURM and the scheduler dispatches them to available nodes automatically, which is perfect for parallel workloads like docking.
Sito works by fully automating and parallelizing AutoDock Vina dockings across separate SLURM jobs on HPC servers (Trott and Olson[5], 2010). So the speedup factor attained by our method is precisely the number of parallel jobs allowed on the HPC for that SLURM user. Since on our servers the default is 100 we attained a 10000% speedup, but if we needed more we could have asked the HPC administrators to raise that limit. So with a trivial permission to have more parallel jobs the speed up can comfortably reach factors of thousands or even tens of thousands compared to regular AutoDock Vina Screening. (Trott and Olson[5], 2010)
Thanks to Sito parallelizing our docking we were able to run over 810 AutoDock Vina docking simulations on the HPC to better understand how our engineered peptides interact with various plastics. (Trott and Olson[5], 2010)
We created Python generator scripts that make plastic ligands as SMILES string representations. These are then converted with Open Babel into SDFs which is the 3D file format needed for our docking software. The source code for all python generators is provided below as well as in the software repository.
The table reports predicted binding free energy (kcal mol⁻¹). More negative values indicate more favorable binding within the model (Trott and Olson[5], 2010; AutoDock Vina Manual[31], 2020). Because binding free energy scales logarithmically with the dissociation constant (∆G = RT ln KD), a change of ~1.36 kcal mol⁻¹ at 298 K corresponds to about a tenfold change in affinity, so large gaps are meaningful while sub‑kilocalorie differences are not (Hu et al.[30], 2021; Kenny[32], 2019). Docking scores are approximate and most reliable for ranking candidates generated with the same receptor model and identical docking settings, not for reading out absolute affinities (Cosconati et al.[33], 2010; Meng et al.[34], 2011). This relative interpretation is appropriate here because our goal is rapid triage of constructs for follow‑up.
Within this screen we confirm plausible binding of our peptides to polystyrene (PS) and polyethylene terephthalate (PET). For example, PS Unit C reaches −7.0 kcal mol⁻¹ on PS, which is at the favorable end, while the non-aromatic plastics cluster around −3 to −5 kcal mol⁻¹ which still indicates plausible binding in this model validating the possibility of our peptides sticking to labware made out of plastic and steering the team toward the next set of designs. Overall we reached the intended goal with this model which was to confirm that our constructs have a chance at binding to aromatic plastics and informing the rest of the team on how to move forward with experiments, design, and modeling.
Based on the build subteam’s lab results which indicated that we failed to detect expression of our peptides in the lab, the modeling team updated the docking model to include other non-aromatic plastics, due to concerns that our peptides may have bound to the plastic labware which the protocols were conducted with. Since the docking results showed values that indicate possible binding for the non-aromatic plastics, our concern about peptides possibly getting stuck to plastic labware was validated. This influenced the build sub-team to be more aware of the peptides coming into contact with plastic labware and it was also one of the modeling findings that influenced the motivation behind shielding the peptide with two other bigger proteins in the fusion protein design.
Receptor |
---|
ke represents the rate of PS leaving the stomach (hr-1); kt1 represents the rate of PS leaving intestinal compartments (hr-1); kon represents the rate of PS-E. coli association (M-1*hr-1); koff represents the rate of PS-E. coli dissociation (mol-1); kons represents the rate of PS association to E. coli adhered to the intestinal wall (M-1*hr-1); ka represents the rate of PS absorption into the intestinal wall (hr-1); ks represents the rate of E. coli adherence to the intestinal wall (hr-1); ku represents the rate of E. coli dissociation from the intestinal wall (hr-1); kex represents the rate of colon excretion (hr-1).
PS Concentration in Stomach (M)
PS Concentration in Intestines (M)
PS Concentration in Colon (M)
To more accurately predict the ∆G of binding of the peptides to various plastic surfaces as well as gain intuition for the mechanism of binding and the kinetic rates of binding we ran molecular dynamics simulations of our system.
We ran all our molecular dynamics simulations using GROMACS (Abraham et al.[10], 2015). The systems we simulated consisted of the engineered peptides, custom GAFF 8 nanometer by 8 nanometer plastic surfaces provided to us by Micheal T. Bergman (Bergman et al.[11], 2023), and explicit water. The total atom count of the systems was around 86000 atoms. The systems were energy minimized, NPT, and NVT equalized to < 1000 kJ/mol, 300K, and 1000 g/L. We opted for no margin in the xy direction guided by the requirements for the plastic surface, a margin of 1 nm on the bottom and 7 nm on the top relative to the plastic surface to be able to completely energetically explore the unbound region in later metadynamics simulations. We used the 2014 Amber SB forcefield which is commonly used to simulate biological systems and (Maier et al.[12], 2015).
Atoms are treated as classical particles that move according to Newton’s second law under forces derived from a potential energy function. At each time step the forces are computed from the current coordinates, the positions and velocities are updated, and the system evolves in time. Thermostats and barostats control temperature and pressure to sample the desired ensemble. Observables are then obtained as time averages or ensemble averages from the trajectory (Dror et al.[8], 2012).
Time steps must be very small to resolve fast bond and angle vibrations, typically on the order of 1 to 2 femtoseconds. Many biologically relevant events occur on microsecond to millisecond scales, so reaching them requires millions to billions of integration steps. Large systems also increase computational cost because many nonbonded interactions must be evaluated at each step despite neighbor list and cutoff optimizations (Dror et al.[8], 2012).
We used cMD to get a feel of the peptide-plastic system we only ran one production simulation for 20 ns, and chose to run the rest of the simulations using adaptive metadynamic bias. We were able to see binding of the peptide to the PS surface through aromatic-ring π-stacking and concluded that the Kon rate of binding should be fairly fast to allow binding in such a short simulation time with no bias. Approximately understanding the rate of binding is important for understanding how the peptide will behave in experiments as well as infroms downstream math models to have more accurate predictions.
To better predict ∆G of binding, our team chose to employ Well-Tempered Metadynamics (Barducci, Bussi and Parrinello[13], 2008). It is often used for this purpose and yields good predictions often comparable to experimental values (Tiwary and Parrinello[14], 2015). Well-Tempered Metadynamics is an enhanced sampling molecular dynamics technique to explore systems with high energy barriers. Since binding also has high energy barriers and has deep energy basins, metadynamics is perfect for exploring it.
The guiding principle behind metadynamics simulations is filling deep energy wells with periodically deposited extra energy called bias. This is needed to make it easier for the system to get out of that well, since the probability of that happening without bias is low. Mathematically this is described by the Boltzmann distribution.
Periodically depositing small gaussian hills of energy as a function of the state of the system makes the states it visits often (or gets stuck on) more thermodynamically unfavourable and forces the simulation to escape them and explore the whole energy space.
Over time the deposited energy otherwise known as bias of the simulation fills all the basins making every state equally favourable. By recording the deposited bias one can easily recover the free energy landscape as a function of the reaction progress or more precisely the collective variable (CV), since over time the bias deposited converges to the negative of the free energy plus a constant as can be seen in the animation above (Laio and Parrinello[16], 2002).
To then accurately calculate the ∆G from the free energy surface we first take Boltzmann weighted sums of the bound and unbound states.
Next, by taking the log of the weighted sum we may find the free energy of the macrostate and by subtracting the two we finally get ∆G.
To get a ∆Gº comparable to experimental values we must also rescale the unbound volume to the standard molar volume to account for the fact that experiments calculate the ∆G in terms of 1 molar where in the unbound state the ligands get to freely explore a volume of Cº.
Regular metadynamics is somewhat unstable which is why the Well-Tempered variant is usually used (Barducci, Bussi and Parrinello, 2008). In it the height or strength of bias deposited is scaled down as time goes on. The bias also converges to a simple function of free energy scaled by the bias factor.
Since our peptide has many amino acids and is flexible, to make the simulation free energy surface converge we ran the simulation with bias on each amino acid as well as the radius of gyration which is how extended the peptide in parallel using the PLUMED plugin for GROMACS (Tribello et al.[15], 2014). This also meant that the free energy had to be rescaled in terms of a different collective variable.
Due to the nature of metadynamic simulations and how computationally taxing they are, it takes multiple weeks to complete a simulation. Therefore, the final result of this simulation will be presented at the Jamboree.
After the build team failed to confirm expression of the constructs on the Tricine gel with various types of staining and a western blot we decided to model some interactions that we suspected may have caused issues for detection and purification to better infrom the design team about how they should approach the next design cycle. Our team came up with multiple theories of what might have happened. One of them being that the peptide binds to something in the cell shortly after it's expressed. That may be the case because the way our peptides bind through π‑stacking may not be specific to plastics. Which is why we suspected they may aggregate by binding to themselves or bind to other proteins in the cell. This aggregation or unwanted binding may have caused it to show up as a band of a larger size on the gels and hindered its purification and immunodetection because the his‑tag may have been obstructed.
To gain quick insight into this, we decided to run a joint structure prediction with AlphaFold 3 which is capable of predicting the biomolecular interactions between the structures that are folded. Unfortunately, this model does not quantify the strength of binding and we did not have adequate time to quantify it with other models.
To study aggregation we ran the joint structure prediction of five copies of the same peptide and saw AlphaFold predict the Binding Unit to bind to itself through hydrogen bonds. This suggests that the peptides may aggregate, hindering purification and detection.
To see whether our Binding Unit may bind to other cellular proteins, we decided to run a joint folding with EF‑Tu (EFTU1), which is often reported as one of the most abundant non‑ribosomal proteins in E. coli (Ishihama et al.[25], 2008). AlphaFold predicted the peptide will bind to EF‑Tu through hydrogen bonds, π‑bonds, and even an ester covalent bond. This suggests that the peptide may bind to common cellular proteins as well which would hinder purification and detection.
Based on this prediction as well as the predicted ∆G values of binding to many plastics, we advised the design team to put the peptide between two larger proteins in the second cycle to shield it from interacting with other proteins and peptides.
Since our end-goal system is an E.Coli with many peptides expressed on its surface binding to a plastic surface we created a conceptual model to show how the efficacy of binding improves given more peptides in the binding interface. We created a Markov State Machine (MSM) model that computes the converged probability of the number of peptides bound at the surface interface given unfavourable probabilities of binding and unbinding and the total number of peptides.
A Markov state machine is a memoryless stochastic model where the future depends only on the current state. In our case, the states are the counts k of peptides that are simultaneously bound at the cell–plastic interface, with transitions only to neighbouring counts: binding k → k+1 and unbinding k → k−1. This defines a birth–death chain whose time evolution is governed by a chemical master equation (van Kampen[20], 2007).
We generated converged probabilities by simulating many trajectories with the Gillespie stochastic simulation algorithm using the Python library gillespie2 (Gillespie[21], 1976; Gillespie[22], 1977; gillespie2[16], 2025). To stress test the interface, we deliberately chose unfavourable rates with kon = 0.1 and koff = 0.2 and ran N = 100 peptides up to t = 100. Even in this worst case, the probability of the unbound state k = 0 is very low because so many independent binders act in parallel, supporting our intuition that an array of peptides is substantially more robust than a single binder.
We created a to-scale spatial model for better visual understanding of the E. Coli and Plastic bead system using Blender (Blender Foundation[17], 2025). The peptides on the surface of the E. Coli are visualized with rectangular prisms of 3 nm, the diameter of the plastic bead is 500 nm, the visualized E. Coli is 500 nm in diameter and 2500 nm in length. There are 2000 peptides visualized on the surface of the E. Coli which is in normal range for surface proteins (BioNumbers[18], 2025).
This visualization highlights that there will be many peptides simultaneously binding to the plastic surface at once. Supporting our notion of parallel binding for robust adhesion.
We used PROPKA to estimate the pKa values of all amino acids in our constructs that are sensitive to pH, in order to guide the Build team’s buffer selection (Olsson et al.[19], 2011). Based on the ensemble of predicted pKa values shown in our plot, we chose 7.4 as a safe operating pH.
We applied the Henderson-Hasselbalch relation to convert each residue’s pKa into the fraction of the residue in its required protonation state across pH (Henderson[27], 1908; Hasselbalch[28], 1916):
We evaluated the sensitive groups GLU, HIS, TYR, ARG and the terminal groups (C−, N+). The resulting aggregate curves (plotted in Desmos Studio[29], 2025) show high occupancy of the desired protonation forms near pH 7.4, supporting our buffer choice and standard His‑tag practice.
Plasmid | Variant | Length | Sequence | MW (kDa) |
---|---|---|---|---|
pFSUIGEM25-1 | PS Unit A | 24 | MHHHHHHGGGGSWWMRHMFAWRIF |
3.036 |
pFSUIGEM25-12 | PS Unit B | 24 | MHHHHHHGGGGSFWWRTIVWRHIR |
3.025 |
pFSUIGEM25-3 | PS Unit C | 24 | MHHHHHHGGGGSYFIWWWRMFFFR |
3.155 |
pFSUIGEM25-4 | PS Unit D | 24 | MHHHHHHGGGGSRRLWWRWVFRNR |
3.100 |
pFSUIGEM25-5 | PS Unit E | 24 | MHHHHHHGGGGSERNMRRMLWKWR |
3.031 |
Plasmid | Variant | Length | Sequence | MW (kDa) |
---|---|---|---|---|
pFSUIGEM25-6 | PS Unit F | 19 | MHHHHHHWWMRHMFAWRIF |
2.721 |
pFSUIGEM25-7 | PS Unit G | 19 | MHHHHHHFWWRTIVWRHIR |
2.710 |
pFSUIGEM25-8 | PS Unit H | 19 | MHHHHHHYFIWWWRMFFFR |
2.839 |
pFSUIGEM25-9 | PS Unit I | 19 | MHHHHHHRRLWWRWVFRNR |
2.785 |
pFSUIGEM25-10 | PS Unit J | 19 | MHHHHHHERNMRRMLWKWR |
2.716 |
Plasmid | Variant | Length | Sequence | MW (kDa) |
---|---|---|---|---|
pFSUIGEM25-2 | PET‑B | 24 | MHHHHHHGGGGSWHRMQWAHIELW |
2.962 |