Our Dry Lab Experiments
To investigate the interactions between Perfluorooctanoic acid (PFOA) and Thymidylate Synthase (TYMS) at atomic-level resolution, we employed molecular dynamics (MD). MD is a computational technique that simulates the physical movements of atoms and molecules by iteratively solving Newton's equations of motion. This approach provides a dynamic, high-resolution view of molecular behavior, allowing us to analyze conformational changes, binding events, and the energetic principles governing protein-ligand interactions. The AMBER software suite was utilized for all simulations and subsequent analysis.
MD simulations are highly dependent on the quality of its starting model. This multi-step process involves generating the protein structure, placing the ligand, and creating a physically realistic environment.
Example LEaP script
# Force Fields
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc
# Load unique PFOA parameters
UNL = loadmol2 pfoa.mol2
check UNL
loadamberparams pfoa.frcmod
# Load complex
in complex = loadpdb h_gfp_hts.pdb
charge complex
charge UNL
# Neutralize Complex
addions complex Na+ 0
addions complex Cl- 0
# solvate the complex
solvateOct complex OPCBOX 12.0
# Save off files
saveAmberParm complex h_gfp_hts_solvated.prmtop h_gfp_hts_solvated.inpcrd
savePDB complex h_gfp_hts_solvated.pdb
quit
The initial model, having been assembled from multiple components, inevitably contains steric clashes and unfavorable geometries. A two-stage energy minimization process was employed to resolve these issues and relax the system into an energetically stable conformation.
This process was conducted over 8,000-10,000 cycles, beginning with the steepest descent algorithm for rapid removal of severe clashes, followed by the conjugate gradient algorithm for more efficient convergence to a local energy minimum.
Example Minimization Scripts
heating for 80 ps from 100 to 300 K with restraints
&cntrl
imin=0, ! Indicates a molecular dynamics run (not a minimization)
irest=0, ! Indicates that this is not a restart, but a new simulation
ntx=1, ! Initial coordinates are read from the input file, but not velocities
ntb=1, ! Specifies constant volume periodic boundary conditions
cut=10.0, ! Non-bonded cutoff in Angstroms
ntr=1, ! Restraints are applied to certain atoms or groups
ntc=2, ! SHAKE algorithm is used to constrain bonds with hydrogen
ntf=2, ! Force evaluation omits bonds to hydrogen (consistent with SHAKE)
tempi=100.0,! Initial temperature of the system in Kelvin
temp0=300.0,! Target temperature in Kelvin for the heating process
ntt=3, ! Specifies the use of Langevin dynamics for temperature coupling
gamma_ln=1.0,! Collision frequency for Langevin thermostat
nstlim=40000,! Number of MD steps to be performed (80 ps, given dt=0.002)
dt=0.002, ! Time step in picoseconds (2 fs)
ntpr=100, ! Frequency of writing information to the output file
ntwx=100, ! Frequency of writing coordinates to trajectory file
ntwr=1000 ! Frequency of writing the restart file
/
Hold the protein fixed 10.0 res 1 556
END
END
To simulate the system under physiological conditions, its temperature was gradually raised from 0 K to 300 K. This was performed in the NVT (isochoric-isothermal) ensemble, where the number of atoms, volume, and temperature are kept constant. A slow, controlled heating protocol is essential to avoid introducing thermal shock, which could artificially disrupt or denature the protein.
Example Heating Scripts
heating for 80 ps from 0 to 100 K with restraints
&cntrl
imin=0, ! Molecular dynamics run (not minimization)
irest=0, ! Starting a new simulation (not restarting)
ntx=1, ! Coordinates (but not velocities) are read from the input file
ntb=1, ! Constant volume periodic boundary conditions
cut=10.0, ! Non-bonded cutoff in Angstroms
ntr=1, ! Apply restraints to certain atoms
ntc=2, ! SHAKE algorithm applied to constrain bond lengths involving hydrogen
ntf=2, ! Force evaluation (bonds involving hydrogen are omitted)
tempi=0.0, ! Initial temperature (K)
temp0=100.0,! Target temperature for heating (K)
ntt=3, ! Langevin thermostat used for temperature coupling
gamma_ln=1.0,! Collision frequency for Langevin thermostat
nstlim=40000,! Number of MD steps to be performed (80 ps)
dt=0.002, ! Time step in picoseconds (2 fs)
ntpr=100, ! Print to output file every 100 steps
ntwx=100, ! Write coordinates to trajectory file every 100 steps
ntwr=1000 ! Write a restart file every 1000 steps
/
Hold the protein fixed 50.0 res 1 556
END
END
After reaching the target temperature, the system must be equilibrated to the correct pressure and density. This step ensures that the simulation begins from a state that is stable in all thermodynamic variables.
Example Equilibration Script
bring to equilibrium at 1atm and 300K - 400 ps with restraints
&cntrl
imin=0, ! Molecular dynamics run (not minimization)
irest=1, ! Indicates restarting a simulation (using velocities from a previous run)
ntx=7, ! Read both coordinates and velocities, and apply SHAKE constraints
ntb=2, ! Constant pressure periodic boundary conditions
cut=10.0, ! Non-bonded cutoff in Angstroms
pres0=1.0, ! Reference pressure for constant pressure simulation (1 atm)
ntp=1, ! Isothermal-isobaric ensemble (constant temperature and pressure)
taup=2.0, ! Pressure relaxation time (ps)
ntr=1, ! Apply positional restraints to certain atoms or groups
ntc=2, ! SHAKE algorithm to constrain bond lengths involving hydrogens
ntf=2, ! Force evaluation omits bonds to hydrogens (consistent with SHAKE)
tempi=300.0,! Initial temperature of the system (Kelvin)
temp0=300.0,! Target temperature (Kelvin)
ntt=3, ! Langevin thermostat for temperature control
gamma_ln=1.0,! Collision frequency for Langevin thermostat (ps^-1)
nstlim=200000,! Number of MD steps to be performed (400 ps)
dt=0.002, ! Time step in picoseconds (2 fs)
ntpr=100, ! Frequency of writing to the output file
ntwx=100, ! Frequency of writing coordinates to trajectory file
ntwr=1000 ! Frequency of writing the restart file
/
Hold the protein fixed 10.0 res 1 556
END
END
With the system fully thermalized and equilibrated, all restraints were removed, and the production simulation was launched. This is the main data-gathering phase, where the system is allowed to evolve freely under the laws of the force field, mimicking its natural dynamic behavior.
Example Production Script
Production run: 200 ns, 1 atm, 300K, unrestrained
&cntrl
imin=0, ! MD run
irest=1, ! Restart from previous run
ntx=5, ! Read coordinates + velocities
ntb=2, ! Constant pressure periodic boundary conditions (NPT)
cut=10.0, ! Cutoff for nonbonded interactions
pres0=1.0, ! Target pressure (1 atm)
ntp=1, ! Constant pressure dynamics
taup=2.0, ! Pressure relaxation time
ntr=0, ! No positional restraints
ntc=2, ! SHAKE on bonds involving H
ntf=2, ! No force evaluation for H bonds (used with SHAKE)
tempi=300.0, ! Initial temperature (300 K)
temp0=300.0, ! Target temperature (300 K)
ntt=3, ! Langevin thermostat
gamma_ln=1.0, ! Collision frequency
nstlim=100000000, ! Total steps = 200 ns / 0.002 ps = 100 million
dt=0.002, ! 2 fs timestep
ntpr=25000, ! Print to .out every 50 ps
ntwx=5000, ! Save to .mdcrd every 10 ps
ntwr=250000 ! Write restart file every 500 ps
/

PFOA will preferentially bind to the TYMS-GFP protein, inducing a conformational change in the GFP. This change, in turn, could cause a change in fluorescence.
The simulation was run following the standard protocol previously described. PFOA was placed in the active site of TYMS, as predicted by NeuralPlexer and extra PFOA was placed around the protein to simulate 20 mM PFOA conditions (solubility limit of PFOA).
The simulation of PFOA with the TYMS-GFP fusion protein revealed significant conformational changes, localized to the flexible linker region, while the core domains of each protein remained structurally stable. The results support the hypothesis, providing a detailed molecular mechanism for the observed changes.
To assess the overall structural changes of the fusion protein, we analyzed the Radius of Gyration (Rg) and the Center of Mass (COM) distance between the two domains.


Interpretation: The TYMS and GFP domains are behaving as two stable, rigid bodies connected by a flexible linker. The binding of PFOA induces a significant reorientation of these domains relative to each other, rather than causing them to unfold or deform.
To confirm that the observed motion was a rigid-body rotation and not a result of the GFP domain unfolding, we calculated the Root Mean Square Deviation (RMSD) of its backbone.

Interpretation: The stability of the GFP domain's RMSD confirms that it is not undergoing any significant internal structural change or unfolding. Therefore, the large-scale motions observed in the Rg and COM analyses are due to the movement of the entire, intact GFP domain. But this is a negative result in the sense that, GFP’s structure does not change which means there will not be a change in fluorescence.
To identify the mechanism responsible for the GFP domain's rotation, we analyzed the internal motions of the linker itself using a Dynamical Cross-Correlation Matrix (DCCM).

Interpretation: The strong anti-correlated motion between these two segments identifies a specific molecular hinge or pivot point within the linker. This hinge, located around residues 249-250, allows the two halves of the linker to move in opposite directions, facilitating the large-scale rotation of the GFP domain. The motion is not random but a structured mechanical movement enabled by the linker's specific architecture.

Finally, to link these conformational changes to the presence of PFOA, we analyzed the hydrogen bonds formed between the ligand and the protein.


Figure 7: Autocorrelation plot of the most stable hydrogen bonds observed between PFOA and the protein.

Figure 8: Autocorrelation plot of the longest hydrogen bonds observed between PFOA and the protein, showing that even the longest-lived bonds decay rapidly.
The longest interactions are summarized in the table below, with the top interaction partners being ARG-300, THR-501, LYS-558, and SER-466.
| #Acceptor | DonorH | Donor | Frames | Frac | AvgDist | AvgAng |
|---|---|---|---|---|---|---|
| UNL_564@O2 | ARG_300@HE | ARG_300@NE | 1623 | 0.0811 | 2.843 | 157.6929 |
| UNL_570@O1 | THR_501@HG1 | THR_501@OG1 | 1065 | 0.0532 | 2.7283 | 164.7128 |
| UNL_567@F8 | LYS_558@H | LYS_558@N | 1015 | 0.0508 | 2.8922 | 155.0876 |
| UNL_577@O2 | ARG_465@HH11 | ARG_465@NH1 | 976 | 0.0488 | 2.8279 | 156.0362 |
| UNL_577@O2 | SER_466@HG | SER_466@OG | 967 | 0.0483 | 2.7163 | 161.7635 |
| UNL_564@O1 | ARG_300@HE | ARG_300@NE | 914 | 0.0457 | 2.8453 | 159.5653 |
| UNL_577@O1 | SER_466@HG | SER_466@OG | 787 | 0.0394 | 2.7215 | 161.1722 |
| UNL_564@O2 | ARG_300@HH21 | ARG_300@NH2 | 778 | 0.0389 | 2.8287 | 153.5673 |
| UNL_564@O1 | TYR_508@HH | TYR_508@OH | 725 | 0.0362 | 2.7691 | 159.1798 |
| UNL_564@O2 | TYR_508@HH | TYR_508@OH | 709 | 0.0355 | 2.7844 | 158.696 |
| UNL_564@O2 | HIE_506@HE2 | HIE_506@NE2 | 649 | 0.0324 | 2.829 | 148.7741 |
| UNL_564@O1 | HIE_506@HE2 | HIE_506@NE2 | 503 | 0.0251 | 2.8337 | 152.11 |
Table 1: A summary of the most persistent hydrogen bonds, detailing the acceptor and donor residues, the fraction of time the bond existed, and its average geometry.
Interpretation: In this blind docking simulation, PFOA does not appear to find a single, high-affinity binding site where it resides for a long period. Instead, it engages in numerous, weak, and short-lived interactions across the protein surface. This binding behavior likely disrupts the native, stable conformation of the linker, introducing enough flexibility and instability for it to explore the large-scale rotational motion observed. The effect of PFOA is not to "pull" the protein into a new shape via a strong bond, but rather to "loosen" the linker by preventing it from settling into a single stable state.
5. Accessible Surface Area (SASA)
To assess the local impact of PFOA on the TYMS active site, we calculated the Solvent Accessible Surface Area (SASA) of the binding pocket residues over time.

Binding Pocket SASA Analysis: The SASA of the binding pocket shows a distinct conformational change. It begins in a relatively compact state (~400-500 Ų) and, upon interaction with PFOA, transitions to a more open and solvent-exposed state (~600-700 Ų). Critically, this expanded conformation persists for the remainder of the simulation, even during periods when a PFOA molecule is not present in the active site.
Interpretation: The persistence of the high-SASA state indicates that the transient binding of PFOA induces a lasting destabilization of the active site. The pocket does not immediately relax to its original, compact conformation after the ligand dissociates. This effect suggests that even small interactions with PFOA can lock the active site into a more open, and likely less catalytically competent, state.
The blind docking simulation does not support our initial hypothesis and provided a detailed, step-by-step molecular mechanism. The results demonstrate that PFOA, through a series of transient and non-specific interactions, induces flexibility in the TYMS-GFP linker. This flexibility is channeled into a specific rotational motion around a molecular hinge point within the linker, leading to a significant reorientation of the stable GFP domain. However, because GFP’s structure remains nearly identical, it can be concluded that PFOA will not produce a strong fluorescence change when in contact with TYMS-GFP. Furthermore, the analysis of the TYMS binding pocket SASA revealed a persistent destabilization. Even after the docked PFOA molecule dissociates, the active site remains in a more open and solvent-exposed conformation. This finding provides a direct mechanistic link between transient PFOA exposure and a lasting impairment of the enzyme's function, complementing the larger-scale conformational changes observed in the fusion protein.
Hypothesis: We hypothesized that PFOA molecules would interact transiently with TYMS, modulating active-site accessibility and forming hydrogen bonds without establishing a stable inhibitory complex. Given the presence of multiple PFOA molecules in the simulation, we expected competition among ligands and mostly short-lived contacts rather than long-lived, high-occupancy binding.
Changes to common methods:

Figure 9: The solvent accessible surface area of the TYMS active site.
Figure 10: Active-site SASA (raw trace) with cumulative average. Near-verticals of the blue graph indicate the start (~25 ns) and end (~125 ns) of the extended elevated-SASA window.
SASA Dynamics
The active-site SASA started around ~27,500 Ų and increased sharply to ~31,500 Ų around 25 ns, remaining elevated for an extended period (~25–125 ns).
Fluctuations following the initial increase suggest PFOA molecules dynamically sample binding poses rather than fully dissociating.

Figure 11: Total number of PFOA–TYMS hydrogen bonds over time (20000 frames; summed across PFOA molecules). Used to correlate bonding with SASA events.
Hydrogen-Bond Analysis
| Acceptor | DonorH | Donor | Frames | Frac | AvgDist | AvgAng |
|---|---|---|---|---|---|---|
| UNL_589@O2 | THR_570@HG1 | THR_570@OG1 | 3676 | 0.184 | 2.731 | 162.6 |
| UNL_589@O1 | THR_570@HG1 | THR_570@OG1 | 2444 | 0.122 | 2.732 | 162.6 |
| UNL_589@O1 | ARG_342@HH21 | ARG_342@NH2 | 1862 | 0.093 | 2.837 | 155.1 |
| UNL_589@O2 | ARG_342@HE | ARG_342@NE | 1771 | 0.089 | 2.837 | 155.8 |
| UNL_589@O1 | ARG_342@HE | ARG_342@NE | 1733 | 0.087 | 2.843 | 156 |
| UNL_589@O2 | ARG_342@HH21 | ARG_342@NH2 | 1679 | 0.084 | 2.833 | 154.8 |
| UNL_581@O1 | ARG_440@HH11 | ARG_440@NH1 | 1391 | 0.07 | 2.821 | 158.6 |
| UNL_589@O1 | ARG_342@HH11 | ARG_342@NH1 | 1325 | 0.066 | 2.813 | 157.5 |
| UNL_595@O2 | ARG_55@HH21 | ARG_55@NH2 | 1249 | 0.062 | 2.839 | 155.8 |
| UNL_582@O2 | ARG_153@HH21 | ARG_153@NH2 | 1118 | 0.056 | 2.836 | 153 |
| UNL_595@O1 | ARG_55@HH21 | ARG_55@NH2 | 1066 | 0.053 | 2.846 | 155.6 |
| UNL_582@O2 | ARG_153@HE | ARG_153@NE | 982 | 0.049 | 2.852 | 155.2 |
| UNL_582@O1 | ARG_153@HH21 | ARG_153@NH2 | 897 | 0.045 | 2.837 | 153.6 |
| UNL_581@O2 | ARG_440@HH11 | ARG_440@NH1 | 885 | 0.044 | 2.824 | 158 |
| UNL_594@O1 | ARG_342@HH21 | ARG_342@NH2 | 745 | 0.037 | 2.824 | 159.2 |
| UNL_595@O2 | ARG_55@HE | ARG_55@NE | 745 | 0.037 | 2.852 | 155.3 |
| UNL_595@O1 | ARG_55@HE | ARG_55@NE | 740 | 0.037 | 2.859 | 154.1 |
Caption: Fraction of frames each top H-bond was observed (aggregated across all PFOA molecules). Cutoffs: distance ≤ 3.0 Å, angle ≥ 135°.
The presence of PFOA would serve as an inhibitor to the enzyme TYMS when the cofactor, mTHF, was present alongside it. We thought this because 5 Fluorouracil is an inhibitor and PFOA itself is fluorinated.
This simulation was run following the standard protocols that were described previously, however an additional force field had to be created in order to account for mTHF. TYMS and mTHF were made into a separate complex to be used as a protein.
The MMPBSA ran with TYMS in the presence of the mTHF and PFOA revealed that PFOA was indeed acting as an inhibitor and that the presence of mTHF was helping to mitigate its effects. The results of the simulation and MMPBSA supported our initial hypothesis.

Figure 12: Without the presence of the cofactor in the complex after the equilibration process the PFOA leaves the binding site completely and becomes separate from the TYMS enzyme, however with the presence of mTHF the mobility of PFOA is significantly decreased, as well much more consistent.
Interpretation: In the absence of the cofactor, mTHF, PFOA fails to remain in the active site of the TYMS enzyme, meaning the interaction between TYMS and PFOA alone are too weak to maintain a proper binding. Alternatively while mTHF was present PFOA was able to hold a more stable association to TYMS and remain closer to its active site.
Results support the hypothesis that PFOA acts as an inhibitor for TYMS, and by itself PFOA and TYMS are unable to form a strong enough reaction to remain strongly associated or to keep PFOA bound to the active site. The presence of mTHF allows PFOA to be more stable with the enzyme, suggesting that mTHF aids to stabilize an interaction between PFOA and TYMS.
We built a reproducible reverse-screening workflow that starts with a deprotonated PFOA ligand and ranks human protein targets by integrating results from seven public resources. The workflow standardizes the ligand input, reconciles all targets to UniProt, merges per-source scores in a single sheet, and applies a transparent shortlisting rubric that considers (a) score, (b) cross-database support, (c) disease relevance, (d) novelty in PFAS literature, and (e) assayability/cost. Re-running the screens with the corrected ligand representation changed retrievals and priorities, increased agreement across sources, and surfaced thymidylate synthase (TYMS) as a strong candidate for follow-up. We maintain cleaned CSVs/Google Sheets for the merged hits, the v1 vs. v2 comparison, the rubric shortlisting, and sourcing.
Last season's screens mixed ligand forms and tool conventions, which made single-database hits noisy and cross-tool comparisons unreliable. This cycle we needed one consistent ligand representation, one naming system (UniProt), and one auditable rubric so rankings were traceable. The key correction was using deprotonated PFOA rather than a neutral form, which materially shifted which proteins looked plausible and improved cross-database agreement.
The workflow accepts a canonical SMILES for deprotonated PFOA, queries SuperPred, STITCH, PharmMapper, TargetNet, SEA-style, SwissTargetPrediction, and UniProt mappings, and produces a merged, human-only table keyed to UniProt with per-source scores, evidence counts, disease-theme notes, novelty flags, and assayability comments. It also outputs a rubric-based shortlist and a simple v1 (old) vs. v2 (corrected) comparison with overlaps and rank/score shifts.
The workflow keeps a single ligand string across all tools, logs raw scores and link-outs in consistent columns, normalizes every target to UniProt, and explicitly counts cross-database support. It applies a rubric with defined thresholds and tie-breaks, records why a target advanced (comments next to the row), and keeps a side-by-side v1 vs. v2 sheet so changes are easy to explain. It concludes with a sourcing sheet (vendor, catalog, quantity, price, notes) for the top candidates.
The process begins by fixing a canonical deprotonated PFOA SMILES, submitting that input to seven sources, and collecting raw hits and scores into identical tracking sheets. A merge step collapses duplicates by UniProt and removes non-human entries. A ranking step uses per-source scores plus cross-database support and then applies the rubric (score, support, disease relevance, novelty, assayability/cost). A comparison step joins the earlier run to the corrected run by UniProt and summarizes overlaps and deltas. A final step prepares the shortlist and sourcing.
For ligand standardization, we used one deprotonated PFOA SMILES everywhere and documented it in each sheet; we also reserved an "alt-ligand" field for sensitivity checks without mixing inputs.
For multi-database screening, we ran the ligand through seven sources, captured the native score or confidence from each, recorded link-outs, and kept raw columns unchanged so anyone can audit the entries.
For identifier hygiene, we mapped all hits to UniProt IDs, resolved gene symbols and protein names from those IDs, dropped non-human entries, and merged duplicates to a single target row that keeps the per-source evidence side-by-side.
For shortlisting, we applied the rubric you defined (score, cross-database support, disease relevance, novelty, assayability/cost) with clear thresholds and tie-breaks (prefer novelty, then broader agreement, then cost/availability).
For comparison, we built a v1 vs. v2 table (old neutral-form run vs. corrected deprotonated-form run), computed overlap, Δrank, Δscore, grouped changes by high-level disease themes (oncology, inflammation, lipid/bile), and tagged the driver as an input correction where applicable.
For hand-off, we prepared a sourcing sheet for the top 18 targets (vendor, catalog, quantity, price, notes) and coordinated with the protein team; this enabled prioritization and led to selecting TYMS for the first round.
Re-screening with deprotonated PFOA increased consensus across sources and reduced single-tool outliers. The shortlist shifted toward oncology and inflammation themes, and TYMS rose to the top based on consistent support, clear disease relevance, and practical assay paths. We added TYMS to the sourcing list, shared costs/availability with the wet-lab team, and lined it up as the first target for follow-up.