We had ONE challenge:

Find the best enzyme mutant!

Simulation Results

0

enzyme mutants generated by our team

Introduction

How does docking work?

Molecular docking is a modeling technique used to predict how a protein (enzyme) interacts with small molecules (ligands). The method aims to identify correct poses of ligands in the binding pocket of a protein and to predict the affinity between the ligand and the protein.

Docking result

Figure 1 — Docking Principle

Docking, however, comes with important limitations.

One of the main challenges is that proteins are not rigid: they fluctuate and change conformation over time. To properly capture these dynamics, molecular dynamics (MD) simulations are often used in combination with docking. MD allows for a more realistic description of flexibility and energetics, but it is extremely demanding in terms of computational time and expertise.

Following the advice of Dr. Aguero, we chose not to perform molecular dynamics simulations, as they would have exceeded the scope and resources of our project. As a result, our docking results should be interpreted with caution: they are qualitative rather than quantitative, and they cannot be considered rigorous evidence of binding. Instead, they serve as exploratory insights to guide future experimental validation.

Why did we use docking?

After discussions with several chemistry experts, we concluded that TFA was probably too stable to undergo classic defluorination as each C–F bond strengthens the adjacent ones. Instead of working directly on TFA, we wanted to first transform it into N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide by reacting it with 4-chlorobenzylamine in the presence of the lipase SpL. The chlorine atom and the aromatic ring may enable electron delocalization, and desymmetrize the C–F bonds, which could help lower the energy barrier for an F⁻ to leave.

However, the structure of N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide differs significantly from that of TFA and is likely too bulky to fit into the catalytic site of the dehalogenase. At this stage, the goal became to modify the access tunnel leading to the catalytic site, allowing N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide to enter and be processed by the enzyme.

Figure 2 — Positioning of Cycled TFA (N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide) within Undocked RPA6311

Concretely, we mutated amino acids of the tunnel leading to the catalytic site of RPA1163 to tailor it to the chemical and spatial properties of cycled TFA, which greatly differs from TFA. We chose RPA1163 for our mutation because it is the enzyme among our selection with the best structure resolution.

But we did not restrict our adaptation to cycled TFA. We improved theoretical ligand-catalytic site affinity between RPA1163 and:

  • FA
  • DA
  • TFA
  • Cycled TFA
  • PFOA
  • 1,1,1-trifluoro-2-butanone

Material & Methods

HUMAN VS ARTIFICIAL INTELLIGENCE

We first started by performing the residue mutation manually and calculating the binding energy with YASARA. Later, we collaborated with Mr. Robert to automate the generation of in silico mutants using AI.

TOOL 1: Yasara

YASARA is a molecular docking tool that allowed us to quantify the affinity and the binding energy between the substrate and the enzyme . Our objective was to ensure that the ligand could reach the enzyme’s active site and establish a strong affinity for binding. Ultimately, the goal was to improve the enzyme affinity for different substrates (e.g. FA, DFA, TFA, N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide, PFOA and 1,1,1,-trifluoro-2-butanone). All substrates’ 3D structures were downloaded on PubChem as a sdf format and then converted to pdb using OPENBABEL. These files were eventually saved as YASARA objects in the .yob format.

About the enzyme structure, we worked with the 3R3Z crystal structure of RPA1163 wild-type (WT), solved at 1.70 Å. Thanks to the extensive bibliographic work carried out on RPA1163, we were able to identify the amino acids most relevant to mutate in order to facilitate ligand access to the active site.

Docking result

Figure 3 — RPA6311 (3R3U) Homodimer 3D Structure.

RPA1163 dehalogenase is a homodimer, meaning Chains A and B are identical (cf figure 3). Since the catalytic pocket is not located at the interface of the two monomers, we chose to work with a single monomer. We confirmed that this simplification does not affect the accessibility of the active site to the ligand by analyzing the ligand access pathway to the active site, both from the literature (Khusnutdinova et al., 2024) and using YASARA. This approach was validated by an expert in protein design, Mr. Bettler. Consequently, Chain B was excluded from the computational analyses. In addition, all water molecules were discarded, and the protein structure was optimized by adding hydrogen atoms (Edit > Clean > All / Edit > Add > Hydrogens to > All).

The 4-Steps Procedure

  • Step 1: Amino acid modifications (after analysis of the catalytic site).
  • Step 2: Generation of the mutant structure with AlphaFold 3.
  • Step 3: Docking (study of the theoretical binding energy between the mutated enzyme and ligand).
  • Step 4: Ranking of the mutants.

Here's how we proceeded for step 1: Mutant generation

We moved the ligand with the mouse in the tunnel leading to the active binding site, and mutating simultaneously residues.

1. Residue mutation: We used the tool “swap” → “residue to a residue”.

This could potentially create new bonds (e.g., hydrogen bonds) with different ligands: FA, DFA, TFA, N-[(4-chlorophenyl)methyl]-2,2,2-trifluoroacetamide, PFOA, and 1,1,1-trifluoro-2-butanone.

2. Interaction visualization: View → Show interaction → Other interaction between → Molecules (selection: enzyme and ligand) → Select: hydrophobic, Pi-Pi, cation-pi, ionic.

3. Mutated enzyme generation: Structures were generated with AlphaFold 3.

4. We performed RMSD calculations between mutated structures and the original structure to verify that the overall fold was preserved (RMSD threshold: 2 Å).

In total, we generated ~250 mutated structures meeting our criteria, which were then evaluated through docking analyses.

Here's how we proceeded for step 2: Docking

1. Define a simulation cell: Define a simulation cell around the active site atoms.

Tool: Simulation → Define Simulation Cell, then click on "around selected atoms" → Cell Auto, Extension = 5.0, Shape = Cuboid, and select “around selected atoms”.

2. Save scene: Save the mutated enzymes and the simulation cell as a YASARA scene (e.g., 3r3z_receptor.sce). This results in a YASARA scene file (the modified enzyme with the cell simulation) and a YASARA .yob file (the ligand).

3. Generate simulation

A docking simulation predicts how a small molecule (ligand) binds to a target protein. Using YASARA, the software places the ligand in the protein’s binding site, explores possible orientations, and estimates the binding energy. This identifies the most likely interaction pose and assesses the strength and stability of the protein–ligand complex.

Simulations were performed using the macro dock_run , with the .sce file (enzyme + simulation cell) and the .yob file (ligand). We adjusted the number of runs to 100 and kept the residues rigid. After running the macro, we obtained a document with the binding energy [kcal/mol] and a list of contacting receptor residues.

Here's how we proceeded for step 3: Mutant selection

We had 3 criteria to assess the quality of a mutant:

  • Key residues : - Conservation in the contacting receptor residues of the residues reported in the literature as essential for performing the catalysis and the travel through the tunnel leading to the catalytic site.
  • Distance : A distance between the carbon holding the 3 fluorid and the O oxygen of the aspartic acid inferior to 3.5 Å (distance to perform SN2).
  • Binding energy : The binding energy between the mutant and the ligand, corresponding to the amount of (free) energy released when the ligand binds to the enzyme’s active site. The more negative (lower) the binding energy, the tighter the binding.

Based on those criteria, we ranked them and selected the best ones.

TOOL 2: Seq-Mutator

Seq mutator is a tool developed by the iGEM Team of the University of Münster (2024). This software enables data-efficient protein engineering by predicting highly functional protein variants with only a small number of experimental measurements. We planned to use it as an iteration tool after testing our mutants in the lab. However, the tool also includes a zero-shot prediction module, which suggests potentially beneficial mutations without requiring prior experimental data.

TOOL 3: LigandMNNP

LigandMNNP was a gitlab tool suggested by the iGEM Team of the University of Münster as the tool that gave them promising results. However, despite our attempts, we were again unable to generate mutants with improved affinity as the model would either suggest mutants with 1 amino acid variation, which was not enough to alter affinity, or above 20 amino acid variation, which was too much.

HOW WE TURNED A 50 YEARS CALCULATION INTO A 5-DAY PROCESS

Although manual mutagenesis can provide valuable insights, it remains fundamentally limited as it only explores a tiny fraction of the mutations that might matter.

A Through bibliographic research, we identified the amino acids that are most critical for ligand access. Eight crucial residues were chosen as candidates for mutation. While each could be mutated using one of the 21 existing amino acids, two were eliminated from the outset: glycine and proline. The former introduces too much flexibility, while the latter introduces too much rigidity, both of which would risk destabilizing the overall structure of the enzyme. This left 19 amino acids, representing a combination of...

...19^6 = 47,045,881 possible mutants. representing over 50 years of computation!

To reduce this crazy amount of time, we limited residue substitutions to only a selection of amino acids most likely to enhance ligand delivery to the active site:

Amino Acid

Figure 4 — Amino Acid Substitution Selection for the 8 Residues of the catalytic site

However, this still represented 381,024 mutants to test — equivalent to two years of continuous 24/7 calculations on our local machine.

Obviously, brute force was not an option. We had to refine our strategy.

That’s when artificial intelligence changed everything.

The AI Breakthrough

We contacted Dr . Xavier Robert, who introduced us to IA-based bioinformatics simulations, and opened the doors of the IN2P3 supercomputer, which notably processes data from CERN Europe’s largest particle accelerator!

Although the center focuses primarily on nuclear physics and cosmology, it has recently opened up to biology experiments. Due to the novelty of our project, we were granted access to this facility to handle our extensive combinatorial search.

Xavier Robert

Dr. Xavier Robert

Research Engineer at IBCP, his research focuses on retrovirology and chemoinformatics for biotechnological and therapeutic applications. He is expert in structural biology (X-ray crystallography) and bioinformatics methods such as computer-aided simulations, molecular docking and modelling. He is also involved in software engineering (ESPript, ENDscript and FoldScript).


Following several meetings to clarify our requirements, Dr. X. Robert developed a cutting-edge molecular docking software pipeline based on AI.

The principle of the computational protocol was rather simple:

  • 1 Automate the generation of the whole 381,024 in silico mutants,
  • 2 Perform a molecular docking of the ligand onto all these protein models,
  • 3 Analyze these complexes one by one using AI.

Following analysis, the protein-ligand complexes were scored using multifactorial criteria to evaluate binding affinity and structural compatibility. This approach enabled us to identify and sort the most promising candidates with unprecedented precision.

What would have taken half a century was compressed into just 5 days.

The 4-Steps Protocol: State-of-the art AI-based molecular docking pipeline, by Dr. X. Robert

  • 1. Digital design of 3D virtual mutants using SCWRL4 to generate flexible mutants from the initial experimental structure (PDB: 3R3U). This software allows point mutations to be introduced while maintaining realistic spatial constraints.
  • 2. AI-based docking of the ligand against virtual mutants with GWOvina. The ligand and key target residues of the mutants are considered flexible, and the program calculates the nine most probable poses. This is followed by a re-ranking step with GNINA using a convolutional neural network (CNN) model to identify the most likely complex
  • 3. Global ranking of all mutants based on a multifactorial AI-derived score.
  • 4. Geometric filtering by calculating the distance between the oxygen atom of Asp110 (the catalytic nucleophile responsible for SN2 attack) and the carbon atom bearing the three fluorines. Mutants were retained only if this distance was ≤ 3.5 Å.

The protocol was initially tested on a prototype set of around 100 mutants to validate its functionality. It was then parallelized and automated to enable large-scale mutagenesis and docking simulations. The entire simulation took five days thanks to the massive parallelization capabilities of the IN2P3 computing center, instead of the several years it would have taken on our laboratory computers.

The final outcome of this process was a ranked list of the virtual mutants, each associated with a score reflecting its predicted ability to interact with the ligand. We selected the 9th first mutant of the ranking (See Winners Carrousel).

⚠️ Limitations of AI-Based Docking: All our docking simulations remain in silico predictions. While this approach is extremely useful to rapidly screen a large number of mutants and guide our design, it does not guarantee actual enzymatic activity or stability — only in vitro experiments can confirm these results. Moreover, our workflow could theoretically be refined with molecular dynamics simulations to better account for protein flexibility and improve prediction accuracy. Even with high-performance computing resources, such simulations would require several years, making them impractical for our project.

And the winners are...

stage lights socle