Software

Our Dry Lab Experiments

Overview

To investigate the interactions between Perfluorooctanoic acid (PFOA) and Thymidylate Synthase (TYMS) at atomic-level resolution, we employed molecular dynamics (MD). MD is a computational technique that simulates the physical movements of atoms and molecules by iteratively solving Newton's equations of motion. This approach provides a dynamic, high-resolution view of molecular behavior, allowing us to analyze conformational changes, binding events, and the energetic principles governing protein-ligand interactions. The AMBER software suite was utilized for all simulations and subsequent analysis.


1. System Preparation: Constructing the Initial Model

MD simulations are highly dependent on the quality of its starting model. This multi-step process involves generating the protein structure, placing the ligand, and creating a physically realistic environment.

  • Protein Structure Generation: The initial three-dimensional coordinates for the TYMS protein were predicted using AlphaFold3. By inputting the primary amino acid sequence, this deep learning model generates a highly accurate protein structure, which serves as the foundation for our simulations. AlphaFold3 was primarily used for generating the TYMS-GFP structure, whilst the TYMS dimer’s structure was retrieved from RCSB.
  • Ligand Placement and Docking:
    • Targeted Docking: To study interactions at a specific binding site, the initial protein-ligand complex was generated using NeuralPlexer. This deep learning-based tool predicts the most probable binding pose (position and orientation) of a single PFOA molecule within the protein's active site.
    • Saturation and Blind Docking: For simulations designed to explore non-specific binding or identify novel allosteric sites, Packmol was used. This tool places a specified number of PFOA molecules at random positions in the simulation box, creating a high-concentration environment that facilitates the observation of diverse binding events.
  • Force Fields and System Assembly: The system was constructed using the tleap module in AmberTools. A force field—a set of parameters that defines the potential energy of the system—was assigned to all molecules.
    • The ff19SB force field was chosen for the protein due to its accuracy in describing protein backbone and side-chain dynamics.
    • The GAFF2 (General Amber Force Field) was used to parameterize the PFOA ligand, as it is optimized for small organic molecules.
  • Solvation and Neutralization: Proteins exist and function in an aqueous environment. We therefore solvated the system in an explicit water model (OPC), which accurately reproduces the physical properties of water. The complex was placed in a truncated octahedral box, a computationally efficient shape that mimics a sphere, with a minimum buffer of 12.0 Å of water between the solute and the box boundary. This buffer is critical to prevent the protein from artificially interacting with its periodic image. Finally, counterions (Na⁺/Cl⁻) were added to neutralize the system's net charge, mimicking physiological ionic strength.

Example LEaP script

# Force Fields
source leaprc.protein.ff19SB
source leaprc.gaff2
source leaprc.water.opc

# Load unique PFOA parameters
UNL = loadmol2 pfoa.mol2
check UNL
loadamberparams pfoa.frcmod

# Load complex
in complex = loadpdb h_gfp_hts.pdb
charge complex
charge UNL

# Neutralize Complex
addions complex Na+ 0
addions complex Cl- 0

# solvate the complex
solvateOct complex OPCBOX 12.0

# Save off files
saveAmberParm complex h_gfp_hts_solvated.prmtop h_gfp_hts_solvated.inpcrd
savePDB complex h_gfp_hts_solvated.pdb
quit

2. Energy Minimization: Resolving Unfavorable Geometries

The initial model, having been assembled from multiple components, inevitably contains steric clashes and unfavorable geometries. A two-stage energy minimization process was employed to resolve these issues and relax the system into an energetically stable conformation.

  • Stage 1: Solvent and Ion Minimization: The protein-ligand complex was held rigid with a harmonic restraint, and only the positions of the water molecules and ions were minimized. This crucial first step allows the solvent to form an optimal, hydrogen-bonded network around the solute without the initial random placement of water molecules distorting the protein's structure.
  • Stage 2: Full System Minimization: All restraints were removed, and the entire system was minimized. This allows the protein and ligand atoms to adjust their positions slightly to relieve any remaining internal strain.

This process was conducted over 8,000-10,000 cycles, beginning with the steepest descent algorithm for rapid removal of severe clashes, followed by the conjugate gradient algorithm for more efficient convergence to a local energy minimum.

Example Minimization Scripts

heating for 80 ps from 100 to 300 K with restraints
&cntrl
  imin=0,     ! Indicates a molecular dynamics run (not a minimization)
  irest=0,    ! Indicates that this is not a restart, but a new simulation
  ntx=1,      ! Initial coordinates are read from the input file, but not velocities
  ntb=1,      ! Specifies constant volume periodic boundary conditions
  cut=10.0,   ! Non-bonded cutoff in Angstroms
  ntr=1,      ! Restraints are applied to certain atoms or groups
  ntc=2,      ! SHAKE algorithm is used to constrain bonds with hydrogen
  ntf=2,      ! Force evaluation omits bonds to hydrogen (consistent with SHAKE)
  tempi=100.0,! Initial temperature of the system in Kelvin
  temp0=300.0,! Target temperature in Kelvin for the heating process
  ntt=3,      ! Specifies the use of Langevin dynamics for temperature coupling
  gamma_ln=1.0,! Collision frequency for Langevin thermostat
  nstlim=40000,! Number of MD steps to be performed (80 ps, given dt=0.002)
  dt=0.002,   ! Time step in picoseconds (2 fs)
  ntpr=100,   ! Frequency of writing information to the output file
  ntwx=100,   ! Frequency of writing coordinates to trajectory file
  ntwr=1000   ! Frequency of writing the restart file
/
Hold the protein fixed 10.0 res 1 556
END
END

3. System Heating: Introducing Thermal Energy

To simulate the system under physiological conditions, its temperature was gradually raised from 0 K to 300 K. This was performed in the NVT (isochoric-isothermal) ensemble, where the number of atoms, volume, and temperature are kept constant. A slow, controlled heating protocol is essential to avoid introducing thermal shock, which could artificially disrupt or denature the protein.

  • Controlled Heating Protocol: The system was heated in two distinct phases: first from 0 K to 100 K over 80 ps, and then from 100 K to 300 K over another 80 ps.
  • Temperature Regulation: Temperature was controlled using a Langevin thermostat. This algorithm simulates the effect of a true heat bath by adding frictional and random forces to the atoms, ensuring robust and realistic temperature coupling. During heating, weak positional restraints were maintained on the protein to prevent large-scale unfolding motions as kinetic energy was introduced.

Example Heating Scripts

heating for 80 ps from 0 to 100 K with restraints
&cntrl
  imin=0,     ! Molecular dynamics run (not minimization)
  irest=0,    ! Starting a new simulation (not restarting)
  ntx=1,      ! Coordinates (but not velocities) are read from the input file
  ntb=1,      ! Constant volume periodic boundary conditions
  cut=10.0,   ! Non-bonded cutoff in Angstroms
  ntr=1,      ! Apply restraints to certain atoms
  ntc=2,      ! SHAKE algorithm applied to constrain bond lengths involving hydrogen
  ntf=2,      ! Force evaluation (bonds involving hydrogen are omitted)
  tempi=0.0,  ! Initial temperature (K)
  temp0=100.0,! Target temperature for heating (K)
  ntt=3,      ! Langevin thermostat used for temperature coupling
  gamma_ln=1.0,! Collision frequency for Langevin thermostat
  nstlim=40000,! Number of MD steps to be performed (80 ps)
  dt=0.002,   ! Time step in picoseconds (2 fs)
  ntpr=100,   ! Print to output file every 100 steps
  ntwx=100,   ! Write coordinates to trajectory file every 100 steps
  ntwr=1000   ! Write a restart file every 1000 steps
/
Hold the protein fixed 50.0 res 1 556
END
END

4. Equilibration: Achieving Thermal and Baric Stability

After reaching the target temperature, the system must be equilibrated to the correct pressure and density. This step ensures that the simulation begins from a state that is stable in all thermodynamic variables.

  • Constant Pressure Simulation: The system was simulated using NPT (isothermal-isobaric) ensemble, where the number of atoms, pressure (1 atm), and temperature (300 K) are held constant. In this ensemble, the volume of the simulation box is allowed to fluctuate.
  • Pressure Regulation: A barostat was used to maintain a constant pressure, effectively acting as a virtual piston that adjusts the box size until the system's density converges to a stable value. Positional restraints were kept on the protein during this phase to allow the solvent density to equilibrate properly around a stable solute conformation.

Example Equilibration Script

bring to equilibrium at 1atm and 300K - 400 ps with restraints
&cntrl
  imin=0,     ! Molecular dynamics run (not minimization)
  irest=1,    ! Indicates restarting a simulation (using velocities from a previous run)
  ntx=7,      ! Read both coordinates and velocities, and apply SHAKE constraints
  ntb=2,      ! Constant pressure periodic boundary conditions
  cut=10.0,   ! Non-bonded cutoff in Angstroms
  pres0=1.0,  ! Reference pressure for constant pressure simulation (1 atm)
  ntp=1,      ! Isothermal-isobaric ensemble (constant temperature and pressure)
  taup=2.0,   ! Pressure relaxation time (ps)
  ntr=1,      ! Apply positional restraints to certain atoms or groups
  ntc=2,      ! SHAKE algorithm to constrain bond lengths involving hydrogens
  ntf=2,      ! Force evaluation omits bonds to hydrogens (consistent with SHAKE)
  tempi=300.0,! Initial temperature of the system (Kelvin)
  temp0=300.0,! Target temperature (Kelvin)
  ntt=3,      ! Langevin thermostat for temperature control
  gamma_ln=1.0,! Collision frequency for Langevin thermostat (ps^-1)
  nstlim=200000,! Number of MD steps to be performed (400 ps)
  dt=0.002,   ! Time step in picoseconds (2 fs)
  ntpr=100,   ! Frequency of writing to the output file
  ntwx=100,   ! Frequency of writing coordinates to trajectory file
  ntwr=1000   ! Frequency of writing the restart file
/
Hold the protein fixed 10.0 res 1 556
END
END

5. Production Simulation: Data Collection

With the system fully thermalized and equilibrated, all restraints were removed, and the production simulation was launched. This is the main data-gathering phase, where the system is allowed to evolve freely under the laws of the force field, mimicking its natural dynamic behavior.

  • Simulation Conditions: The simulation was run in the NPT ensemble (300 K, 1 atm) to accurately represent physiological conditions.
  • Integration Time Step: The SHAKE algorithm was used to constrain all covalent bonds involving hydrogen atoms. Since these bonds vibrate at a very high frequency, constraining their length allows for the use of a larger integration time step (2 fs) without loss of numerical stability. This dramatically increases the computational efficiency of the simulation, allowing us to simulate longer timescales.
  • Data Output: The coordinates of all atoms were saved at regular intervals, generating a trajectory file. This trajectory serves as a molecular "movie" and is the primary source of data for all subsequent analyses of protein dynamics, conformational changes, and ligand interactions.

Example Production Script

Production run: 200 ns, 1 atm, 300K, unrestrained
&cntrl
  imin=0,         ! MD run
  irest=1,        ! Restart from previous run
  ntx=5,          ! Read coordinates + velocities
  ntb=2,          ! Constant pressure periodic boundary conditions (NPT)
  cut=10.0,       ! Cutoff for nonbonded interactions
  pres0=1.0,      ! Target pressure (1 atm)
  ntp=1,          ! Constant pressure dynamics
  taup=2.0,       ! Pressure relaxation time
  ntr=0,          ! No positional restraints
  ntc=2,          ! SHAKE on bonds involving H
  ntf=2,          ! No force evaluation for H bonds (used with SHAKE)
  tempi=300.0,    ! Initial temperature (300 K)
  temp0=300.0,    ! Target temperature (300 K)
  ntt=3,          ! Langevin thermostat
  gamma_ln=1.0,   ! Collision frequency
  nstlim=100000000, ! Total steps = 200 ns / 0.002 ps = 100 million
  dt=0.002,       ! 2 fs timestep
  ntpr=25000,     ! Print to .out every 50 ps
  ntwx=5000,     ! Save to .mdcrd every 10 ps
  ntwr=250000     ! Write restart file every 500 ps
/

Summarized Workflow

Image Placeholder

MD Experiments


Experiment: Blind Docking of PFOA on TYMS-GFP

Hypothesis:

PFOA will preferentially bind to the TYMS-GFP protein, inducing a conformational change in the GFP. This change, in turn, could cause a change in fluorescence.

Changes to Common Methods:

The simulation was run following the standard protocol previously described. PFOA was placed in the active site of TYMS, as predicted by NeuralPlexer and extra PFOA was placed around the protein to simulate 20 mM PFOA conditions (solubility limit of PFOA).

Results and Interpretation

The simulation of PFOA with the TYMS-GFP fusion protein revealed significant conformational changes, localized to the flexible linker region, while the core domains of each protein remained structurally stable. The results support the hypothesis, providing a detailed molecular mechanism for the observed changes.

1. Global Conformational Dynamics: A Large-Scale Reorientation

To assess the overall structural changes of the fusion protein, we analyzed the Radius of Gyration (Rg) and the Center of Mass (COM) distance between the two domains.

  • Radius of Gyration (Rg) Analysis: The Rg measures the compactness of a protein. Our analysis shows that while the individual Rg values for the TYMS domain (residues 1-250) and the GFP domain (residues 251-end) slightly decrease, the Rg of the entire protein shows a marked increase over the simulation. This indicates that while the individual domains are becoming slightly more compact, the overall protein is adopting a more extended conformation.

Image Placeholder

Figure 1: The Radius of Gyration (Rg) for the whole protein (blue), TYMS domain (orange), and GFP domain (green). The plot shows an increasing Rg for the entire complex, while the individual domains remain compact.

  • Inter-Domain Distance Analysis: While the increasing global Rg suggests the domains are moving apart, the Center of Mass (COM) distance between TYMS and GFP shows a slight decrease. This seemingly contradictory result is resolved by visual inspection of the trajectory, which reveals the conformational change is not a simple linear separation but a complex 90-degree rotation of the GFP domain relative to the TYMS domain. This rotation increases the longest axis of the protein, thus increasing the overall Rg, while simultaneously bringing the geometric centers of the two domains slightly closer.

Image Placeholder

Figure 2: The distance between the Center of Mass of the TYMS domain and the GFP domain over time. The decreasing trend indicates the domains are, on average, moving closer together.

Interpretation: The TYMS and GFP domains are behaving as two stable, rigid bodies connected by a flexible linker. The binding of PFOA induces a significant reorientation of these domains relative to each other, rather than causing them to unfold or deform.

2. Structural Stability of the GFP Domain

To confirm that the observed motion was a rigid-body rotation and not a result of the GFP domain unfolding, we calculated the Root Mean Square Deviation (RMSD) of its backbone.

  • GFP RMSD Analysis: The RMSD of the GFP domain (residues 18-234, representing its core structure) remains exceptionally stable throughout the simulation, fluctuating between 1.0 and 1.75 Å. An RMSD value below 2.0 Å is a strong indicator that the protein's core fold is being maintained.

Image Placeholder

Figure 3: The backbone RMSD of the core GFP domain (residues 18-234). The stable fluctuation below 2.0 Å confirms the domain does not undergo any significant internal structural change.

Interpretation: The stability of the GFP domain's RMSD confirms that it is not undergoing any significant internal structural change or unfolding. Therefore, the large-scale motions observed in the Rg and COM analyses are due to the movement of the entire, intact GFP domain. But this is a negative result in the sense that, GFP’s structure does not change which means there will not be a change in fluorescence.

3. The Molecular Hinge: Linker Dynamics

To identify the mechanism responsible for the GFP domain's rotation, we analyzed the internal motions of the linker itself using a Dynamical Cross-Correlation Matrix (DCCM).

  • DCCM Analysis: A DCCM reveals how pairs of residues move in relation to one another. Positively correlated (red) regions indicate residues moving in the same direction, while anti-correlated (blue) regions indicate movement in opposite directions. The DCCM of the linker (residues ~234-279) shows two distinct blocks of correlated motion corresponding to helical segments. Crucially, there is a strong anti-correlation between the first helical segment (residues ~234-248) and the second segment (~250-264).

Image Placeholder

Figure 4: The DCCM of the linker residues. The red diagonal blocks show correlated motion within two alpha-helical segments, while the off-diagonal blue regions show strong anti-correlated motion between them.

Interpretation: The strong anti-correlated motion between these two segments identifies a specific molecular hinge or pivot point within the linker. This hinge, located around residues 249-250, allows the two halves of the linker to move in opposite directions, facilitating the large-scale rotation of the GFP domain. The motion is not random but a structured mechanical movement enabled by the linker's specific architecture.

Image Placeholder

Figure 5: A structural representation of the linker, with the two anti-correlated helical segments colored differently (e.g., red and blue) to highlight the pivot point.

4. The Role of PFOA: Transient Hydrogen Bonding

Finally, to link these conformational changes to the presence of PFOA, we analyzed the hydrogen bonds formed between the ligand and the protein.

  • Hydrogen Bond Lifetime Analysis: The analysis of all potential hydrogen bonds reveals that the vast majority are highly transient, forming and breaking on a very short timescale. Even when filtering for the most stable interactions (those lasting over 500 frames ~ 100 fs), no single hydrogen bond persists for a significant fraction of the simulation (the maximum occupancy, was ~8%).

Image Placeholder

Figure 6: Autocorrelation plot of the hydrogen bonds observed between PFOA and the protein.

Image Placeholder

Figure 7: Autocorrelation plot of the most stable hydrogen bonds observed between PFOA and the protein.

Image Placeholder

Figure 8: Autocorrelation plot of the longest hydrogen bonds observed between PFOA and the protein, showing that even the longest-lived bonds decay rapidly.

The longest interactions are summarized in the table below, with the top interaction partners being ARG-300, THR-501, LYS-558, and SER-466.

#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
UNL_564@O2 ARG_300@HE ARG_300@NE 1623 0.0811 2.843 157.6929
UNL_570@O1 THR_501@HG1 THR_501@OG1 1065 0.0532 2.7283 164.7128
UNL_567@F8 LYS_558@H LYS_558@N 1015 0.0508 2.8922 155.0876
UNL_577@O2 ARG_465@HH11 ARG_465@NH1 976 0.0488 2.8279 156.0362
UNL_577@O2 SER_466@HG SER_466@OG 967 0.0483 2.7163 161.7635
UNL_564@O1 ARG_300@HE ARG_300@NE 914 0.0457 2.8453 159.5653
UNL_577@O1 SER_466@HG SER_466@OG 787 0.0394 2.7215 161.1722
UNL_564@O2 ARG_300@HH21 ARG_300@NH2 778 0.0389 2.8287 153.5673
UNL_564@O1 TYR_508@HH TYR_508@OH 725 0.0362 2.7691 159.1798
UNL_564@O2 TYR_508@HH TYR_508@OH 709 0.0355 2.7844 158.696
UNL_564@O2 HIE_506@HE2 HIE_506@NE2 649 0.0324 2.829 148.7741
UNL_564@O1 HIE_506@HE2 HIE_506@NE2 503 0.0251 2.8337 152.11

Table 1: A summary of the most persistent hydrogen bonds, detailing the acceptor and donor residues, the fraction of time the bond existed, and its average geometry.

Interpretation: In this blind docking simulation, PFOA does not appear to find a single, high-affinity binding site where it resides for a long period. Instead, it engages in numerous, weak, and short-lived interactions across the protein surface. This binding behavior likely disrupts the native, stable conformation of the linker, introducing enough flexibility and instability for it to explore the large-scale rotational motion observed. The effect of PFOA is not to "pull" the protein into a new shape via a strong bond, but rather to "loosen" the linker by preventing it from settling into a single stable state.

5. Accessible Surface Area (SASA)

To assess the local impact of PFOA on the TYMS active site, we calculated the Solvent Accessible Surface Area (SASA) of the binding pocket residues over time.

Image Placeholder

Binding Pocket SASA Analysis: The SASA of the binding pocket shows a distinct conformational change. It begins in a relatively compact state (~400-500 Ų) and, upon interaction with PFOA, transitions to a more open and solvent-exposed state (~600-700 Ų). Critically, this expanded conformation persists for the remainder of the simulation, even during periods when a PFOA molecule is not present in the active site.

Interpretation: The persistence of the high-SASA state indicates that the transient binding of PFOA induces a lasting destabilization of the active site. The pocket does not immediately relax to its original, compact conformation after the ligand dissociates. This effect suggests that even small interactions with PFOA can lock the active site into a more open, and likely less catalytically competent, state.

Conclusion

The blind docking simulation does not support our initial hypothesis and provided a detailed, step-by-step molecular mechanism. The results demonstrate that PFOA, through a series of transient and non-specific interactions, induces flexibility in the TYMS-GFP linker. This flexibility is channeled into a specific rotational motion around a molecular hinge point within the linker, leading to a significant reorientation of the stable GFP domain. However, because GFP’s structure remains nearly identical, it can be concluded that PFOA will not produce a strong fluorescence change when in contact with TYMS-GFP. Furthermore, the analysis of the TYMS binding pocket SASA revealed a persistent destabilization. Even after the docked PFOA molecule dissociates, the active site remains in a more open and solvent-exposed conformation. This finding provides a direct mechanistic link between transient PFOA exposure and a lasting impairment of the enzyme's function, complementing the larger-scale conformational changes observed in the fusion protein.


Experiment: Solvent Accessible Surface Area of PFOA on TYMS

Hypothesis: We hypothesized that PFOA molecules would interact transiently with TYMS, modulating active-site accessibility and forming hydrogen bonds without establishing a stable inhibitory complex. Given the presence of multiple PFOA molecules in the simulation, we expected competition among ligands and mostly short-lived contacts rather than long-lived, high-occupancy binding.

Changes to common methods:

  • System composition: The simulation included TYMS and 20 PFOA molecules. To meet the PDB 99K atom limit, the protein’s terminal tails were truncated. Truncated residues were excluded from SASA and hydrogen-bond analyses where appropriate.
  • Parameterization: PFOA molecules were parameterized for the force field used in production MD runs.
  • Trajectory analysis: CPPTRAJ was used to compute SASA (probe radius \= 1.4 Å) and hydrogen bonds (distance ≤ 3.0 Å, angle ≥ 135°). H-bonds were tracked for each PFOA–protein pair across all frames.
  • Data processing: SASA was analyzed as both a raw trace and a cumulative average. Hydrogen-bond occupancy was calculated as the fraction of frames in which each bond was present.

Results and Interpretation:

Image Placeholder

Figure 9: The solvent accessible surface area of the TYMS active site.

Figure 10: Active-site SASA (raw trace) with cumulative average. Near-verticals of the blue graph indicate the start (~25 ns) and end (~125 ns) of the extended elevated-SASA window.

SASA Dynamics

  • The active-site SASA started around ~27,500 Ų and increased sharply to ~31,500 Ų around 25 ns, remaining elevated for an extended period (~25–125 ns).

  • Fluctuations following the initial increase suggest PFOA molecules dynamically sample binding poses rather than fully dissociating.

Image Placeholder

Figure 11: Total number of PFOA–TYMS hydrogen bonds over time (20000 frames; summed across PFOA molecules). Used to correlate bonding with SASA events.

Hydrogen-Bond Analysis

  • The number of PFOA–TYMS hydrogen bonds fluctuated over the trajectory (totaling ~20–30 at any given frame), correlating with SASA dynamics.
  • Hydrogen-bond occupancy analysis shows that no single bond is highly persistent (maximum occupancy ≲0.4), consistent with transient interactions.
Acceptor DonorH Donor Frames Frac AvgDist AvgAng
UNL_589@O2 THR_570@HG1 THR_570@OG1 3676 0.184 2.731 162.6
UNL_589@O1 THR_570@HG1 THR_570@OG1 2444 0.122 2.732 162.6
UNL_589@O1 ARG_342@HH21 ARG_342@NH2 1862 0.093 2.837 155.1
UNL_589@O2 ARG_342@HE ARG_342@NE 1771 0.089 2.837 155.8
UNL_589@O1 ARG_342@HE ARG_342@NE 1733 0.087 2.843 156
UNL_589@O2 ARG_342@HH21 ARG_342@NH2 1679 0.084 2.833 154.8
UNL_581@O1 ARG_440@HH11 ARG_440@NH1 1391 0.07 2.821 158.6
UNL_589@O1 ARG_342@HH11 ARG_342@NH1 1325 0.066 2.813 157.5
UNL_595@O2 ARG_55@HH21 ARG_55@NH2 1249 0.062 2.839 155.8
UNL_582@O2 ARG_153@HH21 ARG_153@NH2 1118 0.056 2.836 153
UNL_595@O1 ARG_55@HH21 ARG_55@NH2 1066 0.053 2.846 155.6
UNL_582@O2 ARG_153@HE ARG_153@NE 982 0.049 2.852 155.2
UNL_582@O1 ARG_153@HH21 ARG_153@NH2 897 0.045 2.837 153.6
UNL_581@O2 ARG_440@HH11 ARG_440@NH1 885 0.044 2.824 158
UNL_594@O1 ARG_342@HH21 ARG_342@NH2 745 0.037 2.824 159.2
UNL_595@O2 ARG_55@HE ARG_55@NE 745 0.037 2.852 155.3
UNL_595@O1 ARG_55@HE ARG_55@NE 740 0.037 2.859 154.1

Caption: Fraction of frames each top H-bond was observed (aggregated across all PFOA molecules). Cutoffs: distance ≤ 3.0 Å, angle ≥ 135°.

Interpretation

  • The SASA jump at ~25 ns coincides with a decrease in total hydrogen bonds, suggesting partial unbinding of PFOA from initial binding sites.
  • Subsequent fluctuations in SASA and H-bonds indicate dynamic sampling of multiple binding poses, rather than complete dissociation.
  • Top hydrogen bonds, such as UNL_589@O2 – THR_570@OG1 and UNL_595@O2 – ARG_55@NH2, persist longer than others, indicating possible anchoring interactions.
  • Lower-occupancy hydrogen bonds occur at flexible or solvent-exposed regions, consistent with heterogeneous binding influenced by local protein dynamics.

Conclusion

  • PFOA forms intermittent, non-persistent contacts with TYMS, with occupancies ≤0.4
  • Extended periods of elevated SASA (~31,000 Ų) indicate that PFOA spends significant time in solvent-exposed states.
  • Overall, the results support a weak and transient association between PFOA and TYMS, rather than a tightly bound, inhibitory complex under the simulation conditions sampled.

Experiment: MMPBSA of TYMS with cofactor and PFOA

Hypothesis:

The presence of PFOA would serve as an inhibitor to the enzyme TYMS when the cofactor, mTHF, was present alongside it. We thought this because 5 Fluorouracil is an inhibitor and PFOA itself is fluorinated.

Changes to common methods:

This simulation was run following the standard protocols that were described previously, however an additional force field had to be created in order to account for mTHF. TYMS and mTHF were made into a separate complex to be used as a protein.

Results and Interpretation:

The MMPBSA ran with TYMS in the presence of the mTHF and PFOA revealed that PFOA was indeed acting as an inhibitor and that the presence of mTHF was helping to mitigate its effects. The results of the simulation and MMPBSA supported our initial hypothesis.

  • It should be noted that an additional MMPBSA was run just on TYMS and PFOA however despite countless troubleshooting it was unable to run properly and so we had to rely on the simulation and the trajectory file.

Image Placeholder

Figure 12: Without the presence of the cofactor in the complex after the equilibration process the PFOA leaves the binding site completely and becomes separate from the TYMS enzyme, however with the presence of mTHF the mobility of PFOA is significantly decreased, as well much more consistent.

Interpretation: In the absence of the cofactor, mTHF, PFOA fails to remain in the active site of the TYMS enzyme, meaning the interaction between TYMS and PFOA alone are too weak to maintain a proper binding. Alternatively while mTHF was present PFOA was able to hold a more stable association to TYMS and remain closer to its active site.

Conclusion:

Results support the hypothesis that PFOA acts as an inhibitor for TYMS, and by itself PFOA and TYMS are unable to form a strong enough reaction to remain strongly associated or to keep PFOA bound to the active site. The presence of mTHF allows PFOA to be more stable with the enzyme, suggesting that mTHF aids to stabilize an interaction between PFOA and TYMS.

Reverse Screening Toolkit

Abstract

We built a reproducible reverse-screening workflow that starts with a deprotonated PFOA ligand and ranks human protein targets by integrating results from seven public resources. The workflow standardizes the ligand input, reconciles all targets to UniProt, merges per-source scores in a single sheet, and applies a transparent shortlisting rubric that considers (a) score, (b) cross-database support, (c) disease relevance, (d) novelty in PFAS literature, and (e) assayability/cost. Re-running the screens with the corrected ligand representation changed retrievals and priorities, increased agreement across sources, and surfaced thymidylate synthase (TYMS) as a strong candidate for follow-up. We maintain cleaned CSVs/Google Sheets for the merged hits, the v1 vs. v2 comparison, the rubric shortlisting, and sourcing.

Motivation

Last season's screens mixed ligand forms and tool conventions, which made single-database hits noisy and cross-tool comparisons unreliable. This cycle we needed one consistent ligand representation, one naming system (UniProt), and one auditable rubric so rankings were traceable. The key correction was using deprotonated PFOA rather than a neutral form, which materially shifted which proteins looked plausible and improved cross-database agreement.

What the workflow does (at a glance)

The workflow accepts a canonical SMILES for deprotonated PFOA, queries SuperPred, STITCH, PharmMapper, TargetNet, SEA-style, SwissTargetPrediction, and UniProt mappings, and produces a merged, human-only table keyed to UniProt with per-source scores, evidence counts, disease-theme notes, novelty flags, and assayability comments. It also outputs a rubric-based shortlist and a simple v1 (old) vs. v2 (corrected) comparison with overlaps and rank/score shifts.

Key features

The workflow keeps a single ligand string across all tools, logs raw scores and link-outs in consistent columns, normalizes every target to UniProt, and explicitly counts cross-database support. It applies a rubric with defined thresholds and tie-breaks, records why a target advanced (comments next to the row), and keeps a side-by-side v1 vs. v2 sheet so changes are easy to explain. It concludes with a sourcing sheet (vendor, catalog, quantity, price, notes) for the top candidates.

System design

The process begins by fixing a canonical deprotonated PFOA SMILES, submitting that input to seven sources, and collecting raw hits and scores into identical tracking sheets. A merge step collapses duplicates by UniProt and removes non-human entries. A ranking step uses per-source scores plus cross-database support and then applies the rubric (score, support, disease relevance, novelty, assayability/cost). A comparison step joins the earlier run to the corrected run by UniProt and summarizes overlaps and deltas. A final step prepares the shortlist and sourcing.

Methods

For ligand standardization, we used one deprotonated PFOA SMILES everywhere and documented it in each sheet; we also reserved an "alt-ligand" field for sensitivity checks without mixing inputs.
For multi-database screening, we ran the ligand through seven sources, captured the native score or confidence from each, recorded link-outs, and kept raw columns unchanged so anyone can audit the entries.
For identifier hygiene, we mapped all hits to UniProt IDs, resolved gene symbols and protein names from those IDs, dropped non-human entries, and merged duplicates to a single target row that keeps the per-source evidence side-by-side.
For shortlisting, we applied the rubric you defined (score, cross-database support, disease relevance, novelty, assayability/cost) with clear thresholds and tie-breaks (prefer novelty, then broader agreement, then cost/availability).
For comparison, we built a v1 vs. v2 table (old neutral-form run vs. corrected deprotonated-form run), computed overlap, Δrank, Δscore, grouped changes by high-level disease themes (oncology, inflammation, lipid/bile), and tagged the driver as an input correction where applicable.
For hand-off, we prepared a sourcing sheet for the top 18 targets (vendor, catalog, quantity, price, notes) and coordinated with the protein team; this enabled prioritization and led to selecting TYMS for the first round.

Results

Re-screening with deprotonated PFOA increased consensus across sources and reduced single-tool outliers. The shortlist shifted toward oncology and inflammation themes, and TYMS rose to the top based on consistent support, clear disease relevance, and practical assay paths. We added TYMS to the sourcing list, shared costs/availability with the wet-lab team, and lined it up as the first target for follow-up.


× Enlarged image