Software | ShanghaiTech-China

Abstract

Our project has developed a set of protein design software tools. The main achievements include: BetterMPNN can achieve efficient backbone-based protein design and complete the entire process from backbone generation to high-performance binding proteins within a day; BetterEvoDiff optimizes the functions of existing proteins through multi-site collaborative mutations; PPI-APP provides a reliable affinity prediction solution. These tools form a comprehensive computational solution covering from sequence generation to functional verification.

In addition, to overcome the bottlenecks of traditional virtual screening methods, we designed an integrated screening process specifically for the discovery of small-molecule inhibitors targeting flexible targets and successfully applied this integrated workflow to screen the Specs compound library.

Introduction

The core of our project involved developing two generations of reinforcement learning-based protein design tools:

BetterMPNN: A novel high hit-rate & one-shot design framework. BetterMPNN organically integrates the established backbone generation tool RFdiffusion, the sequence design tool ProteinMPNN, and the Group Relative Policy Optimization (GRPO) algorithm. Its core innovation lies in constructing an independent "exploration–evaluation–optimization" loop for each generated backbone. Within this loop, ProteinMPNN acts as the agent, generating sequences for a given backbone; the environment constructs a reward function based on AlphaFold prediction metrics (such as inter-chain PAE, ipTM, pTM); and the GRPO algorithm utilizes the reward signals to continuously optimize ProteinMPNN's parameters. This process enables the model not only to generate foldable sequences but also to learn to produce sequences with superior binding interfaces. This framework can complete the design process from backbone to high-performance binding proteins (e.g., inhibitors targeting the GZMK active pocket) within hours to 20 hours, demonstrating one-shot design potential in dry-lab experiments. Furthermore, the framework possesses the capability to rapidly assess the usability of RFdiffusion-generated backbones early in the training phase, allowing for the timely discarding of poor backbones and significantly enhancing the resource efficiency of the overall design pipeline.
BetterEvoDiff: An optimization tool for existing proteins. BetterEvoDiff is based on the discrete autoregressive model EvoDiff-OADM, combined with the GRPO algorithm, to achieve multi-site cooperative mutations. It enables systematic exploration and optimization of protein properties within the sequence space, making it suitable for enhancing the affinity and functional evolution of binding proteins. Although challenges remain in aligning dry and wet lab rewards during experimental validation, it has demonstrated good convergence properties and the ability to generate high-quality sequences in a single shot within the dry-lab environment.

In addition to the core design tools mentioned above, we have also constructed two key supporting computational workflows to address specific challenges upstream and downstream of protein design:

Physics-guided PPI-Interaction-Affinity-Prediction Pipeline (PPI-APP): A critical bottleneck in de novo protein design is the lack of computational methods capable of high-throughput, rapid, and reliable assessment of designed protein affinity. To address this, we developed PPI-APP. This pipeline combines AlphaFold for initial structure prediction, utilizes PyRosetta for targeted physical refinement based on prediction confidence, and finally employs PBEE to extract physical and interface features from the refined structures for machine learning-based prediction. This provides more reliable affinity estimates within an acceptable computational cost, serving the rapid screening of design results.
Virtual Screening Workflow: To extend our computational capabilities into the realm of small-molecule drug discovery and address the challenge of screening against flexible targets (like GZMK), we developed this integrated workflow. It employs a staged, progressively refined strategy: first, using the conformation-aware model GeminiMol for large-scale conformational similarity searches to rapidly enrich candidate molecules; subsequently performing Induced Fit Docking (IFD) on the enriched molecules to account for target flexibility; and then using MM/GBSA binding free energy calculations for more precise ranking. This workflow has been successfully applied to screen the SPECS compound library, demonstrating its efficiency and accuracy in discovering small-molecule inhibitors for flexible targets.

In summary, we has not only successfully developed the de novo protein design tool BetterMPNN with one-shot design potential and the protein optimization tool BetterEvoDiff, but has also constructed two key supporting workflows—PPI-APP for reliable affinity prediction and the Virtual Screening workflow for small-molecule flexible target screening—to address critical bottlenecks in the design pipeline.

Key Technological Design

BetterMPNN

Abstract

BetterMPNN is a reinforcement learning-based protein sequence design tool aimed at achieving efficient, high hit-rate de novo protein design. It integrates the classical inverse folding model ProteinMPNN with the Group Relative Policy Optimization (GRPO) algorithm, enabling the model to progressively learn to generate protein sequences with both structural rationality and high binding affinity through iterative "exploration–evaluation–optimization" loop.

Introduction

Current de novo protein design techniques can be broadly categorized into two paradigms. The first is the hallucination-based approach, which directly treats amino acid sequences as optimization targets, iteratively refining them through gradient-based updates guided by differentiable proxy metrics, thereby gradually approaching the desired structure. The second is the backbone-first generative design approach, in which candidate backbones are generated by diffusion models such as RFdiffusion, followed by sequence assignment using models such as ProteinMPNN, with final candidates obtained through high-throughput dry- and wet-lab screening. While both approaches have demonstrated success across a wide range of design tasks, the former often struggles to capture the intrinsic discontinuities of the protein sequence–structure–function landscape, whereas the latter tends to rely on extensive experimental screening due to limited targeting ability.

Against this backdrop, we propose a novel design strategy that differs fundamentally from existing approaches. The key idea is to enable generative protein models to progressively explore and learn to produce proteins with composite functional properties. To this end, we develop an “exploration–evaluation–optimization” loop based on the GRPO reinforcement learning algorithm and ProteinMPNN, allowing the model to attempt, compare, and select among different binding modes, and to converge toward one or a few near-optimal solutions of comparable value.

This strategy demonstrates several potential advantages in dry-lab experiments. First, protein generative models can be viewed as high-dimensional mappings over the space of possible proteins; training them with reinforcement learning toward convergence effectively compresses the search space into a limited solution set. In doing so, the method can, to some extent, overcome the limitations of continuous optimization in hallucination-based approaches and explore a broader range of possibilities. Second, it does not require prior specification of hot-spots, enabling the model to autonomously identify favorable binding sites and thereby reducing dependence on manually defined constraints. Third, it allows for early assessment of backbone viability, which helps conserve computational and experimental resources. Finally, the converged model typically favors producing a small number of candidate solutions, substantially reducing the scale of wet-lab screening and enabling efficient validation with minimal experimental throughput.

Using this framework, we successfully designed inhibitors directly targeting active pockets, as well as binders adopting multiple binding modes, both of which were validated through dry-lab experiments. These results indicate that our method not only offers a conceptual departure from existing strategies but also demonstrates practical potential to improve design efficiency and specificity, with the possibility of approaching near one-shot protein design.

Design

Our de novo design framework is built upon an “exploration–evaluation–optimization” loop and consists of three stages with four core components:

Backbone design stage
- RFdiffusion is employed to generate backbone structures, which serve as starting points for sequence design.
Exploration–evaluation–optimization stage
- ProteinMPNN acts as the agent to perform sequence design.
- The environment provides rewards based on AlphaFold predictions and interface scoring.
- The GRPO reinforcement learning algorithm is used to update the agent’s parameters, gradually biasing the model toward generating high-quality results.
Data collection and small-batch generation stage

Through this pipeline, we achieve protein designs approaching a one-shot level of performance.

Initial Backbone Generation

Our design cycle begins with the backbone rather than directly performing de novo sequence generation. Although “discovering proteins with the desired affinity directly from the sequence space” remains a long-term goal, under current technical conditions, searching within the vast and discrete sequence–structure landscape is often inefficient and unstable. Prior studies and practical experience indicate that first constructing a reasonable backbone can significantly compress the target state space, reduce the search radius, and provide a clear geometric foundation for subsequent constraints and evaluations. Moreover, sequence design models built on backbones are generally smaller in scale and exhibit more stable training behavior, making them easier to fine-tune and iterate rapidly.

For backbone generation, we employ RFdiffusion, which has been extensively validated in the community, demonstrates strong backbone design capabilities, and is already a component of our classical pipeline—facilitating seamless integration with downstream sequence design using ProteinMPNN. In practice, before initiating each design task, we use RFdiffusion to generate a batch of candidate backbones specific to the target, forming a backbone pool of several dozen to several hundred structures, depending on task difficulty and resource constraints. Backbones from this pool are then selected to enter the “exploration–evaluation–optimization” loop, ensuring broad search coverage while keeping computational and experimental costs manageable.

Optimization Framework

We adopt a reinforcement learning framework that enables the model to explore the state space through repeated “generation–evaluation–optimization” cycles, thereby acquiring the ability to generate proteins that meet specified objectives and ultimately converging toward optimal solutions. In the following, we describe our optimization framework in terms of the standard components of reinforcement learning.

Agent

Our agent performs sequence generation under a fixed-backbone constraint: given a backbone populated with Gly placeholders (with coordinates and main-chain geometry specified), it outputs amino acid sequences that fold correctly and satisfy the target properties. We choose ProteinMPNN as the sequence-generation backbone. On the one hand, ProteinMPNN natively supports inverse folding under fixed-backbone conditions and effectively leverages main-chain geometry and local environment information to produce foldable sequences; on the other hand, its relatively small parameterization, stable training behavior, and fast inference make it well-suited for high-frequency iteration and rapid fine-tuning within the loop.

Importantly, the default training and inference objectives of ProteinMPNN focus on foldability/backbone compatibility, and do not directly optimize for specific functions or composite properties such as binding affinity. Accordingly, we place ProteinMPNN within a reinforcement learning framework: in each “generation–evaluation–optimization” cycle, the agent first generates a batch of candidate sequences for the given backbone; the environment (based on AlphaFold-derived structural and interface metrics) scores these candidates and produces rewards; finally, GRPO is used to update the parameters of ProteinMPNN, progressively steering its generative distribution toward regions associated with both high affinity and structural reliability. Over multiple iterations, the agent retains compatibility with backbone geometry while acquiring target-oriented generative capabilities, enabling comparative assessment among different binding modes and convergence toward a small set of high-quality solutions.

Environment

Our environment is responsible for evaluating the structural and interfacial properties of sequences generated by the agent, with the evaluation results aggregated into a reward signal fed back into the optimization loop. Since our focus is on binder design, the ideal evaluation should measure “affinity/interface quality.” However, no broadly applicable and reliable computational affinity estimator currently exists. By contrast, AlphaFold (AF) outputs are widely regarded as reliable, and many hallucination-based studies have adopted AF-derived metrics as reward signals. Based on this, we select AlphaFold3 interface and monomer reliability metrics to construct the reward function, as follows:

Inter-chain PAE: PAE estimates the alignment error between residue pairs in predicted structures, with inter-chain PAE specifically measuring pairs across chains. We apply a monotonic transformation to map it into the 0, 1 range, denoted as PAE-Reward. The transformation is defined as:

$$ \text{PAE-Reward} = 1 - \frac{\text{mean\_pae}}{\text{PAE\_MAX}} $$

where PAE-MAX represents the theoretical upper bound of PAE values, and mean_pae is calculated as the arithmetic mean of bidirectional inter-chain PAE values:

$$ \text{mean\_pae} = \frac{\text{PAE}_{A\to B} + \text{PAE}_{B\to A}}{2} $$

This formulation ensures that lower PAE values (indicating better structural alignment) yield higher rewards, with perfect alignment (mean_pae = 0) achieving the maximum PAE-Reward of 1.0.

ipTM: Measures the overall alignment quality of the interface between chains; inherently in the range 0, 1.
pTM: Measures the overall folding reliability of the designed chain; inherently in the range 0, 1.

The combined reward is defined as a linear weighted sum:

$$ \text{Reward} \;=\; a \cdot \text{PAE-Reward} \;+\; b \cdot \text{ipTM} \;+\; c \cdot \text{pTM} \; $$

From an implementation perspective, the AlphaFold inference pipeline consists of two stages: MSA construction and structure generation. The former typically requires tens of minutes to several hours, while the latter takes only ~90 s on an RTX 5090 GPU. Running the full AlphaFold pipeline at each optimization step would render the loop computationally impractical. In de novo small binder tasks (on the order of a few dozen residues), we observed that constructing informative binder MSAs is nearly impossible, while the target MSA (e.g., GZMK) remains consistent across the entire task. Based on this observation, we construct a fixed AlphaFold JSON input containing only the target’s MSA and necessary metadata, bypassing the binder MSA step and proceeding directly to structure generation. Comparisons with the full MSA-based pipeline show nearly identical results across ipTM, PAE, and pTM metrics, indicating that prediction accuracy is not compromised. This strategy eliminates the most time-consuming stage, reducing per-candidate evaluation time to ~90 s, and makes the “generation–evaluation–optimization” loop computationally feasible.

It is worth mentioning that during the design of the reward function, we incorporated an additional clash penalty component. This penalty imposes a -1 reward deduction when atomic clashes are detected in the predicted structure, while maintaining a neutral 0 value for clash-free structures. However, throughout our actual training process, AlphaFold consistently returned a clash-free status (has_clash = 0.0) for all predictions, rendering this penalty term inactive in practice. Consequently, we have omitted this component from our formal reward function description. For those interested in the complete reward function architecture and customizable parameters, our GitLab repository provides detailed code implementations and guidance for modifying or redesigning the reward module.

GRPO

In each optimization cycle, the Agent generates a batch of sequences, and the Environment provides corresponding Reward signals. The model is then updated based on these rewards using reinforcement learning methods.

ProteinMPNN employs an autoregressive generation approach, which allows it to leverage the rich reinforcement learning frameworks originally developed for autoregressive language models.After systematic comparison, we selected the Group Relative Policy Optimization (GRPO) algorithm. This choice was motivated by the unique challenges of protein design tasks and the theoretical advantages of the GRPO algorithm.

The core challenge in protein sequence generation lies in the sparsity of the search space—within the vast sequence space, only a tiny fraction of sequences can simultaneously fulfill the requirements of structural stability and functional specificity. Traditional policy gradient methods often struggle to converge in this setting due to the high variance of reward signals. GRPO effectively mitigates this issue through its group-based relative comparison mechanism. In each round of sequence generation, instead of evaluating the absolute quality of individual sequences in isolation, it performs a relative comparison within the currently generated group of sequences. Specifically, the rewards are first normalized within the group to compute relative advantages. A nonlinear transformation can then be applied to reshape the rewards, enhancing the distinction between high and low-quality sequences.

This relative evaluation mechanism shifts the optimization objective from pursuing absolute high rewards to identifying relatively optimal solutions within the current exploration scope. In the early stages of protein design, even when the absolute reward values of all generated sequences remain low, this relative comparison can still provide meaningful guidance for optimization.

Reward reshaping function
```
  def reshape_rewards(rewards, alpha=REWARD_SHAPING_ALPHA):
      return torch.sign(rewards) * torch.pow(torch.abs(rewards), alpha)
```
Mathematically expressed as:

$$ \text{R}_{\text{reshaped}} = \text{sign}(R) \times |R|^{\alpha} $$

This transformation scales the rewards to increase the differentiation between high and low rewards, making the subsequent advantage calculation more sensitive to relative quality differences.
Loss function design

The loss function design of GRPO reflects a careful balance between exploration and exploitation. It contains two key terms: a policy gradient term that encourages updates towards higher rewards, and a KL divergence penalty term that prevents the model from deviating excessively from its pre-trained knowledge distribution. This design aims to ensure stable training and a monotonic improvement trend.

The KL-related penalty is calculated using a numerically stable formulation:

$$ \text{KL} \approx \exp(\log \text{p} _{\text{ref}} - \log \text{p}) - (\log \text{p} _{\text{ref}} - \log \text{p}) - 1 $$

This approximation provides a stable and efficient method to estimate the divergence between the current policy and the reference policy.
Compatibility with protein design

Protein sequence generation is essentially an optimization problem in a discrete action space, and the intra-group relative comparison mechanism of GRPO is precisely suitable for such discrete decision-making scenarios. In multiple design tasks, we observed that the optimization process guided by GRPO exhibits stable convergence characteristics. The model extensively explores different regions of the sequence space in the initial stage and then gradually focuses on a few high-performance sequence patterns. This "exploration-convergence" behavior pattern exactly matches the natural process of protein design from diversity exploration to specific optimization.

Through the GRPO framework, we successfully transformed ProteinMPNN from a general sequence generation model into a professional design tool capable of targeted optimization for specific functional requirements. This transformation is not only theoretically innovative but also provides a promising technical path for achieving near one-shot protein design in practice.

The establishment of this algorithm framework marks the transformation of protein design methods from the "brute-force" approach relying on large-scale screening to the "precise" approach based on intelligent optimization, opening up new possibilities for the future development of protein design.

Framework

Our overall optimization process follows a standard reinforcement learning loop and can be summarized as follows:

Backbone selection: Choose a candidate backbone from the pool as the starting point for sequence design.
Iterative optimization until convergence:
- Use the updated agent to generate a batch of candidate sequences
- Evaluate each sequence with the environment and compute the corresponding reward
- Update the agent’s parameters using the GRPO algorithm, gradually biasing the generative distribution toward higher-quality regions
- Collect the sequences, rewards, and other relevant metrics generated in the current round
- Repeat the above steps until the convergence criterion is met
- Backbone viability is assessed during optimization based on variations in training signals (loss, reward, KL divergence, …). If viability is determined to be low, the loop is terminated and a new backbone is initiated.
  
  Figure (a) to (d) present the training trajectories of four distinct tasks, demonstrating progressively increasing levels of intelligent performance (i.e., backbone optimization potential).
  
  Figure (a) and (b) show negligible learning progress during early training stages. Such tasks are typically terminated early in practical training to conserve computational resources. Here, the first 90 training steps are displayed to explicitly illustrate the rationale for evaluating backbone potential.
  
  Figure (c) and (d) represent other two tasks. Notably, Figure (d) demonstrates the most promising characteristics: it not only generates more high-reward outcomes in early phases but also exhibits clear convergence toward elevated reward levels in later training stages, indicating superior backbone optimization potential.
  
  Note: Only training curves are shown herein. Comprehensive data analysis can be found in the Results section.

Through the “generation–evaluation–optimization” cycles, the agent continuously adjusts its strategy during exploration and eventually converges to a small set of stable candidates with the desired properties.

Data Collection and Screening

Although our optimization framework can converge to high-quality candidate results and shows a relatively high hit rate in dry-lab experiments, further screening is still necessary to select the most promising solutions. Given the complexity and environment-dependent nature of protein design, we strongly recommend performing as many wet-lab validations as possible whenever resources allow, in order to ensure the reliability and generalizability of the results.

Data Collection

During the design process for each backbone, we systematically collect two types of information:

All sequences generated during training: Since training involves extensive exploration, valuable variants may emerge. These sequences are not only potential candidates but also provide insights into the model’s search trajectory and convergence behavior. Moreover, the current loss function design tends to favor generating average optimal solutions; thus, before full convergence, the model often enters a phase where a large number of desirable protein candidates are produced. Collecting sequences from this stage helps prevent missing these promising results.
Small-batch sequences generated after convergence: Once training is complete, we perform small-batch generation (e.g., 4, 8, or 16 sequences) using the converged model. At this stage, the model often concentrates on producing high-affinity solutions, with core structural domains becoming conserved and properties largely consistent. However, the model may remain uncertain between several equally good binding modes, so batch generation provides a more comprehensive view of the converged distribution rather than relying on a single sequence.

Screening

After data collection, we apply our in-house Analysis Pipeline for final screening. Details of this process are described in the Analysis Pipeline section. Proteins that pass this stage are then subjected to wet-lab validation to further confirm their binding affinity and functional performance.

Practical Application

Our design approach demonstrates three core capabilities, highlighting its capability for achieving near one-shot protein design:

Near-one-shot binder design: The ability to generate high-affinity binders targeting specific sites, validated in dry-lab experiments.
Automatic hot-spot selection: The model autonomously identifies and converges on optimal binding regions without the need for predefined sites.
Rapid backbone viability assessment: The framework can evaluate the feasibility of backbones at early stages of iteration, thereby conserving computational and experimental resources and improving overall design efficiency.

BetterEvoDiff

Abstract

BetterEvoDiff is a protein directed evolution tool based on multi-site synergistic mutations, designed for efficient functional optimization of existing proteins. It combines the discrete autoregressive model EvoDiff-OADM with the Group Relative Policy Optimization (GRPO) algorithm, enabling coordinated multi-site mutations through a random masking and denoising generation mechanism, thereby overcoming the limitations of traditional single-point mutations in exploring protein sequence space.

Introduction

In current de novo protein design research, methods based on RFdiffusion have become one of the most widely adopted frameworks. The classical workflow typically involves the following steps: first, generating candidate backbones with RFdiffusion; then performing sequence design using ProteinMPNN; subsequently applying PyRosetta for structural relaxation; and finally conducting high-throughput screening of candidates with AlphaFold and PyRosetta. This strategy has been shown in practice to yield binders with a certain level of affinity and to achieve relatively high efficiency across multiple tasks. However, this classical workflow lacks explicit optimization steps directed toward functional objectives during the design process. As a result, the full potential of each backbone is often not fully explored, with some high-quality candidates being prematurely discarded, and the efficiency of obtaining high-affinity proteins remains limited.

To address this limitation, we propose a protein directed evolution module based on multi-site synergistic mutations. For any protein—in our case, binders—this module systematically explores the candidate sequence space through coordinated multi-site mutations to identify sequences with improved affinity. Unlike classical approaches that primarily rely on single-point mutations, our design emphasizes the synergistic relationships among amino acid residues. By applying multi-site mutations, the module can more effectively explore both local and global conformational possibilities. This enables the discovery of binders with stronger affinity and more favorable binding modes, while maintaining structural plausibility.

Design

Most existing protein optimization strategies rely on single-point mutations; however, such approaches often overlook epistatic interactions among residues, making it difficult to escape local optima in the fitness landscape and thereby limiting overall optimization potential. In contrast, multi-site synergistic mutations more faithfully capture the coupling relationships between amino acid residues, enabling systematic improvements in binding affinity while maintaining structural plausibility. Based on this insight, we propose a protein directed evolution strategy driven by multi-site synergistic mutations, applied in this project to optimize binders but readily extendable to other protein properties. The method employs a multi-site mutation module that (1) establishes rational relationships among mutation sites, and (2) collects and learns from mutational feedback to guide subsequent, more effective designs. In implementation, we integrate discrete diffusion models with reinforcement learning algorithms to dynamically optimize the mutation process, allowing the model to continuously explore and converge toward candidate sequences with higher affinity.

Multi-site Synergistic Mutations with EvoDiff-OADM

For the implementation of multi-site synergistic mutations, we employ the discrete autoregressive model EvoDiff-OADM as the generative agent. Unlike conventional diffusion models that reconstruct sequences in a fixed order, EvoDiff-OADM restores masked residues in a random sequence during the denoising process, thereby eliminating dependence on the linear arrangement of amino acids. This property makes it particularly suitable for mutation tasks, as functional couplings among residues are often not strictly determined by their sequential positions. The procedure proceeds as follows: first, random masks are applied to the target sequence to introduce exploratory opportunities for multi-site mutations; subsequently, EvoDiff performs the unmasking step to generate a batch of mutated candidate sequences. This mechanism enables the simultaneous introduction of coordinated mutations across multiple sites while preserving their intrinsic interdependencies, thereby providing rational and diverse starting points for subsequent directed evolution.

Mutation Effect Evaluation

For candidate sequences generated through multi-site synergistic mutations, we adopt the same evaluation method as in BetterMPNN. This method constructs an Environment to assess the structural and interfacial quality of candidate sequences, with the results transformed into a reward signal that feeds back into the optimization loop.

Specifically, the evaluation focuses on binder “affinity and interface quality,” and relies primarily on output metrics from AlphaFold:

Inter-chain PAE: alignment error between residues across chains, used to construct the PAE-Reward;
ipTM: overall quality of inter-chain interface alignment;
pTM: reliability of monomer folding.

The reward function is defined as a weighted combination of these metrics and serves as the direct optimization signal. To avoid the computational overhead of running a full MSA pipeline, we adopt a fixed AlphaFold input strategy, which significantly reduces the inference time per candidate and makes closed-loop optimization feasible.

Further implementation details and mathematical formulations can be found in the Environment section of BetterMPNN.

Collecting and Learning from Mutational Information

EvoDiff, as a neural network, provides a natural framework for capturing and learning from the information introduced by mutations. To integrate this feedback into the optimization process, we employ a reinforcement learning paradigm. We utilize the GRPO algorithm, and detailed explanations of its advantages can be found in the GRPO section of BetterMPNN.

The workflow proceeds as follows:

Apply random masks to the amino acid sequence.
Use EvoDiff to perform unmasking and generate a batch of mutated candidate sequences.
Evaluate the generated sequences within the Environment.
Update EvoDiff’s parameters through GRPO optimization based on the evaluation feedback.
Repeat the process iteratively.

Through this closed-loop reinforcement learning framework, EvoDiff progressively learns to associate random masking patterns with beneficial mutations. At convergence, the model consistently restores sequences that achieve the highest reward under random masking, thereby enabling effective exploration and exploitation of the mutational landscape.

Data Collection and Screening

The overall strategy for data collection and screening in the mutation-based framework follows the same principles as in BetterMPNN, and full details can be found in the Data Collection and Screening section of BetterMPNN. In brief, we collect both exploratory sequences generated during training and converged sequences produced after optimization, then apply our in-house Analysis Pipeline for final screening before wet-lab validation.

The main distinction here lies in the scale of mutational generation: instead of producing only a small number of sequences after convergence, we typically recommend generating dozens to hundreds of variants representing different combinations of mutation sites. This larger batch size enables a more comprehensive exploration of the synergistic mutation space and increases the likelihood of identifying the most promising candidates.

PPI-APP

Introduction

One of the core challenges in de novo design of binding proteins lies in the lack of efficient and rapid means for affinity assessment. Since many designed binding proteins have no natural homologous structures, traditional affinity prediction methods based on mutation or homologous information are often inapplicable. Moreover, experimental determination and high - precision molecular docking suffer from high costs and long cycles. In addition, when the characteristics of the test samples differ significantly from those of the training data, the prediction performance of pure data - driven models usually declines significantly.

To address the above issues, we have developed the Physics - guided Protein - Protein Interaction Affinity Prediction Pipeline (PPI - APP). This computational pipeline organically integrates three major tools: AlphaFold3, PyRosetta, and PBEE, and can provide reliable affinity assessment with controllable time and resource consumption. Specifically: AlphaFold3 is responsible for predicting the three - dimensional structure from the protein sequence and providing residue-level confidence assessment; on this basis, PyRosetta conducts confidence-guided structure optimization, making the conformation more in line with physical laws and approaching the real binding state through operations such as targeted relaxation; finally, PBEE extracts interface features based on physical principles from the optimized complex structure and uses a machine - learning model to accurately predict the affinity.

We found that directly using PyRosetta energy scores for affinity estimation is easily interfered with by flexible regions and often underestimates the actual binding force. Combining physical features with data-driven models can significantly improve the robustness and accuracy of prediction. Since most modules of the pipeline are based on physical principles, the obtained features have good interpretability, providing strong support for subsequent model iteration and experimental decision-making.

This pipeline is positioned as a practical screening tool. By balancing and optimizing accuracy and efficiency, it aims to help researchers identify the most promising candidate molecules at a lower cost and higher efficiency in de novo design projects and promote downstream experimental verification.

Development

PyRosetta-Based Structural Refinement

Structural relaxation was conducted using PyRosetta 4 (2024 release) under a confidence-guided refinement strategy, designed to maximize physical realism while preserving accurate regions predicted by AlphaFold3 (AF3). This approach allows selective flexibility during relaxation based on residue-level confidence and interfacial characteristics, thereby minimizing over-relaxation artifacts and maintaining biologically relevant conformations.

1.Residue Classification

Each residue is classified by the classify_residues() function according to both AF3-derived pLDDT scores and interface topology. This hierarchical categorization enables fine-grained control over backbone and sidechain movement during refinement.

High-confidence residues (pLDDT ≥95): well-defined regions with low uncertainty, typically forming the structural core.
Intermediate-confidence residues (80–95, 50–70%): moderately flexible areas, often near loops or interfaces.
Low-confidence or flexible regions (<25%): disordered or uncertain regions requiring extensive sampling.
Interface residues: residues with inter-chain contact ratios above the 75th percentile, indicating key interaction sites.
Hydrophobic and hydrophilic interface residues: further categorized to capture physicochemical balance across the binding surface.
Chain termini: regions prone to instability or unfolding at sequence ends.

This multi-level classification provides the foundation for adaptive relaxation, ensuring that rigid and flexible parts of the model are treated appropriately.

2.MoveMap Configuration

The setup_movemap() function defines the allowed conformational degrees of freedom per residue class:

High-confidence residues: backbone and sidechains are fixed to prevent distortion of accurate regions.
Moderate-confidence residues: sidechain movements are permitted to refine local packing.
Low-confidence residues: both backbone and sidechain movements are allowed to enable large-scale correction.
Interface residues: sidechain flexibility is enforced to optimize inter-chain interactions.

This step tailors relaxation freedom to model uncertainty, ensuring that refinement targets the least reliable regions without degrading confident predictions.

3.Constraint Generation

To maintain overall topology while guiding subtle corrections, the generate_constraints() function applies adaptive restraints:

Strong harmonic coordinate constraints for residues with pLDDT ≥95.
Scaled coordinate restraints (linearly dependent on pLDDT) for 80–95 range.
Flat-bottom restraints for highly flexible or loop regions, allowing broader sampling.
Dihedral angle constraints on α-helices and β-strands to maintain secondary structure integrity.
Terminal restraints to stabilize N- and C-termini against unfolding.

These constraints balance stability and flexibility, ensuring refinement does not deviate from AF3's accurate backbone geometry.

4.Repacking and Scoring Function

Before relaxation, all sidechains are repacked using repack_all_residues() to remove potential rotameric clashes and biases inherited from AF3 models. Relaxation is then performed using the ref2015_cart scoring function, which enables Cartesian optimization and incorporates modern solvation and hydrogen-bonding terms. In certain low-confidence cases, β_nov16_soft is alternatively applied to facilitate smoother convergence in flexible regions.

Overall, this confidence- and interface-guided relaxation strategy preserves structural precision in confident regions while enabling adaptive corrections at flexible or poorly defined sites, producing refined models with improved stereochemical and energetic realism.

Affinity Prediction via PBEE

Following relaxation, the PBEE (Physics-Based Energy Estimator) framework is used to estimate binding free energy. PBEE extracts key interface features—such as van der Waals attraction (fa_atr), repulsion (fa_rep), solvation energy, hydrogen-bond contributions (hbond_sc, hbond_bb_sc), and solvent-accessible surface area (SASA)—from the refined structures. These physical descriptors are fed into a pretrained machine learning model that outputs an estimated binding affinity (ΔG).

This stage integrates physical modeling with data-driven prediction, improving both interpretability and quantitative reliability over direct PyRosetta scoring.

Structural Validation and Evaluation

To assess the reliability of the PPI-Affinity Prediction Pipeline (PPI-APP), we validated both its structural refinement accuracy and binding affinity prediction performance using experimental protein–protein complex data.

1.Structural Assessment

We selected eight high-resolution two-chain complexes from PDBBind+ 2024, chosen for rigid interfaces and available affinity measurements. Each complex was modeled by AlphaFold3 (AF3) in no-MSA mode to match the de novo binder setting, producing five models per target. These were relaxed using PyRosetta FastRelax, guided by per-residue pLDDT and interface classification.

Compared to global relaxation, the confidence-guided protocol preserved stable core regions while refining flexible and interface residues. This reduced the mean inter-model RMSD from ~1.2 Å to 0.8 Å and improved interface RMSD (iRMSD) by up to 35%, showing that relaxation guided by AF3 confidence and interface type promotes convergence toward more physical conformations.

2.Affinity Evaluation

Post-relaxation, binding affinities were predicted with PBEE (Protein Binding Energy Estimator) and benchmarked against experimental ΔG values. Physical interaction features (van der Waals, solvation, hydrogen bonding) were extracted via PyRosetta. Relaxed structures produced consistently better correlations than unrelaxed ones, improving overall prediction accuracy by ~20%.

Compared with PRODIGY, PBEE showed lower variance and fewer outliers, particularly for rigid interfaces. A significant negative correlation (r = –0.68, p < 0.01) was found between affinity deviation (ΔΔG) and interface similarity (iRMSD/fnat), confirming that structural convergence directly improves affinity prediction accuracy.

3.Cross-Validation and Summary

We further tested the workflow on de novo complex structures generated in-house. Predictions maintained RMSE < 1.0 kcal/mol for rigid interfaces and < 1.5 kcal/mol for flexible ones, demonstrating robustness and generalizability.

Overall, PPI-APP effectively integrates AF3 structure prediction, PyRosetta-guided refinement, and PBEE-based affinity evaluation. It provides a fast, interpretable, and physically consistent framework for high-throughput de novo binder screening.

Results

After multiple iteration steps, the results are quite surprising.

Later in this session we evaluate protein protein affinity via delta_G_binding, the free energy change of multiple chains forming a complex, in the unit kcal/mol. This energy unit can be transferred into Kd according to $\Delta G_{binding} = -R*T*\ln(\frac{1}{K_d})=R*T*\ln(K_d)$, and in that way, the more negative delta_G_binding , the higher affinity between proteins.

We first calculated the affinity of the five raw AF3 models for each test protein using prodigy, the most frequently used PPI affinity prediction model. We then repeated the calculation on the adaptively relaxed structures in our own pipeline. The results demonstrate that the iterative optimization of the computational pipeline systematically improves binding affinity prediction accuracy.

This Figure demonstrates the stepwise improvement of the binding affinity prediction workflow across three key stages, evaluated on a benchmark set of seven protein complexes. The top series of plots are scatter plots of predicted binding free energy (ΔG) versus experimental values, while the bottom series display the distribution of predictions for each protein as box plots.

In all scatter plots, each point represents a prediction from one of five structural models, and the dashed line indicates a perfect correlation (y=x). In all box plots, the central line indicates the median, and red diamonds mark the corresponding experimental affinity.

The baseline workflow, combining raw AlphaFold3 structures with the PRODIGY prediction model, exhibits a very weak correlation with experimental data (Pearson r = 0.091). The box plots show significant and inconsistent deviations from the true affinity values (red diamonds).

The second stage incorporated a global structural relaxation protocol (fastrelax). This step substantially improved the predictive power, increasing the correlation to r = 0.337. The prediction distributions are visibly more aligned with the experimental values compared to the baseline.

The final, fully-engineered pipeline utilized our novel confidence-guided adaptive relaxation strategy coupled with the PBEE prediction model. This workflow achieved the highest accuracy, with the Pearson correlation coefficient reaching r = 0.415. The box plots show that predictions are more tightly clustered and centered more closely on the experimental affinities, demonstrating the success of our iterative design-build-test-learn cycles.

Through analyzing the correlation between structural convergence of relaxed models (i.e., inter-model similarity) and prediction accuracy, we observed a noteworthy phenomenon: while a moderate positive correlation exists between structural convergence and prediction accuracy, highly consistent structures may paradoxically lead to decreased predictive precision. This contradictory observation suggests potential systematic errors in the prediction pathway from validated structures to affinity estimation. Consequently, future research will focus on developing more reliable prediction models based on high-quality protein structures.

To validate the generalizability of our pipeline, we tested it using experimentally determined structures independently obtained in our laboratory, which were not involved in any prior parameter optimization. The testing results demonstrated significant correlation: sample 1-6 showed a predicted mean of -8.465 kcal/mol (experimental K_d = 6.863 μM), while sample 1-24 yielded a prediction of -10.01 kcal/mol (experimental ΔG ≈ -8.71 kcal/mol), indicating good agreement between predictions and experimental data.

Notably, the pipeline exhibits systematic biases toward certain specific protein types. Structural analysis reveals that these proteins typically feature larger molecular dimensions and interface regions rich in flexible loops. The current protocol particularly shows room for improvement when handling structures with extensive interfacial loops. We observed that while core loop residues generally exhibit low pLDDT confidence scores, the "anchor" residues connecting these loops to the main protein body often display high confidence. The current strategy of conformational locking for anchor residues may restrict natural loop relaxation, prompting consideration of implementing comprehensive residue-specific optimization strategies for defined loop regions in future developments.

In summary, the pipeline established in this study not only provides a more accurate and efficient solution for protein-protein interaction affinity assessment beyond the PDB database scope, but also demonstrates excellent applicability to protein complexes generated by other de novo design platforms.

Virtual Screening

Abstract and Design Rationale

In our quest to discover small-molecule inhibitors for GZMK, we encountered a primary challenge: the enzyme's active site possesses significant flexibility. This characteristic renders traditional rigid docking methods ineffective for accurately predicting true binding modes, leading to low screening efficiency. Through systematic investigation, we confirmed that GZMK binds to its inhibitors via an "induced-fit" mechanism, necessitating the consideration of dynamic conformational changes in the receptor during binding.

To resolve this conflict between accuracy and efficiency, we designed a staged, progressively refined integrated screening workflow. This pipeline aims to rapidly enrich high-potential molecules using low-cost computational methods, followed by validation with high-accuracy physics-based models, thereby enabling the efficient and precise discovery of inhibitors from large chemical libraries. The workflow consists of four core steps:

Large-Scale Conformational Similarity Search
Induced Fit Docking (IFD)
MM/GBSA Binding Free Energy Calculation
Molecular Dynamics (MD) Simulation

Technical Implementation and Application

Stage 1: Rapid Enrichment via Conformational Similarity Search

The core objective of this stage is to rapidly and cost-effectively screen a large-scale chemical library to produce an "enriched subset" of high-potential molecules. We employed GeminiMol model, which excels at identifying molecules likely to interact with flexible pockets due to its unique Conformational-Space Awareness. Using known serine protease inhibitors like Nafamostat as "seed" molecules, we conducted a similarity search on the Specs library of ~30,000 compounds and selected the top 30 candidates for the next stage.

Stage 2 & 3: Rigorous Validation and Ranking via Induced Fit

For the 30 candidates enriched in Stage 1, we employed more accurate physics-based models for validation. First, Induced Fit Docking (IFD) was used to simulate the binding process with the flexible pocket, yielding more realistic binding poses. Subsequently, the MM/GBSA method was used to calculate the binding free energy, providing a more reliable physics-based score for final ranking.

Stage 4: Dynamic Stability Validation (Planned)

The final step of the pipeline involves using Molecular Dynamics (MD) simulation on the top candidates from the MM/GBSA ranking. This all-atom simulation is designed to definitively confirm the stability of the binding mode in a dynamic environment.

Validation and Results

We successfully applied this integrated workflow to screen the Specs chemical library. The results not only validated the efficacy of our pipeline but also highlighted its significant advantages over traditional methods.

Through the GeminiMol → IFD → MM/GBSA pipeline, we obtained a set of high-potential candidate molecules. Compared to the results from our initial screen using traditional rigid docking, the distribution of MM/GBSA binding free energies for this new set of candidates was markedly superior. More importantly, their predicted binding modes were more rational, forming stable networks of hydrogen bonds, salt bridges, and hydrophobic interactions with the active site. This indicates that our workflow can effectively identify and prioritize molecules that can truly adapt to and stabilize within a flexible pocket, rather than merely finding ligands that fit a static conformation, thereby greatly enhancing the quality of hit compounds.

The significance of this achievement extends beyond GZMK itself. The flexible active site of GZMK is a common feature among many important drug targets, such as other proteases, kinases, and GPCRs. Our successful application serves as a proof-of-concept, demonstrating that this integrated pipeline is a highly promising screening paradigm. It provides a robust solution for other targets with similar flexible characteristics, systematically overcoming the bottlenecks of conventional virtual screening approaches and offering broad generalizability.

Analysis of the results further revealed that several top-ranked candidates consistently showed MM/GBSA binding free energies below –60 kcal/mol, accompanied by favorable IFD scores. These molecules also shared common pharmacophoric features, such as anchoring salt bridges and persistent hydrogen bond networks, suggesting that the pipeline not only enriches high-affinity ligands but also converges on chemically coherent binding modes.

Although the final Molecular Dynamics simulation for the Specs library candidates was not completed due to resource constraints, the existing IFD and MM/GBSA results have already provided us with a high-quality, high-confidence list of candidate molecules. This not only lays a solid foundation for subsequent wet-lab validation for GZMK but, more importantly, establishes and validates a generalizable methodology that paves the way for future drug discovery efforts against a broader range of flexible targets.

L1000_SimilaritySearch.csv

spec_SimilaritySearch.csv

workflow_results.csv

Results

Overview

Our project has developed two reinforcement learning-based protein design tools – BetterMPNN and BetterEvoDiff – aimed at achieving efficient, high hit-rate one-shot protein design. BetterMPNN optimizes sequence generation on fixed backbones by integrating reinforcement learning with ProteinMPNN, enabling it to generate binders with computationally-predicted high affinity in a single inference. BetterEvoDiff employs a multi-site cooperative mutation strategy for the directed optimization of existing proteins.

Our core objectives include:

Establishing a reinforcement learning-driven sequence generation pipeline to achieve an “exploration–evaluation–optimization” loop.
Enabling rapid screening of high-potential backbones to significantly enhance design efficiency.
Allowing the model to autonomously select optimal binding regions without pre-defined sites.
Validating the model's one-shot generation capability for the rapid production of high-performance proteins.

Experimental results demonstrate that BetterMPNN can complete the entire design process from backbone to inhibitors exhibiting high-affinity characteristics in dry-lab experiments within 20 hours, achieving high scores of ipTM > 0.9 as well as pTM > 0.9 in computational evaluations and demonstrating genuine one-shot design capability. Furthermore, the early-stage backbone potential assessment criteria we established further improve the resource efficiency of the design pipeline. BetterEvoDiff also exhibits excellent sequence optimization capability and one-shot potential in dry-lab experiments, providing a new approach for protein directed evolution.

Note: The project code and detailed instructions are publicly available in the project GitLab and GitHub. Subsequent wet-lab validation results will be updated on GitHub.

BetterMPNN

Model Training Dynamics and Convergence Behavior

We first systematically analyzed the training process of the BetterMPNN model across multiple tasks. The training curves (shown in Figure 1a) indicate that the model stably improves the reward value of the generated sequences during iteration, while maintaining the KL divergence between the policy model and the reference model within a reasonable range, without severe fluctuations or collapse. This shows that our GRPO optimization framework can effectively guide ProteinMPNN to gradually learn to generate sequences with higher affinity and structural rationality, while avoiding excessive deviation from its original folding knowledge.

During the training process, we found that the potential of different backbones can be estimated based on the early training results of the model. For instance, in training targeting high-potential backbones (Figures 2a-1, 2b-1), the model could produce sequences with relatively high rewards in the early iteration stages (e.g., the first 40 steps) and continued to converge to a better state in subsequent training. In contrast, for low-potential backbones (Figures 2a-3, 2b-3), the model showed slow reward improvement and slow decrease in sequence diversity in the early stages, suggesting limited optimization space. This phenomenon provides a basis for rapidly judging backbone usability early in training, allowing prioritization of deep optimization for high-potential backbones when resources are limited.

Mutation Patterns and Sequence Convergence Trends

We conducted an in-depth analysis of the mutation patterns of sequences generated during training (Figures 1b-1d). The results show that as training progresses, the average mutation rate of the sequences gradually decreases, while sequence diversity (the degree of difference between sequences generated at the same training step) also shows a convergent trend (Figure 1b), indicating that the model gradually focuses on a few high-performance sequence patterns.

Furthermore, further position-specific mutation frequency analysis (Figure 1c) and sequence logo (Figure 1d) clearly show the mutation frequency and conservation of amino acids at different positions. We can observe distinct distributions of variable and conserved regions: at key binding interface residues, the model tends to maintain specific amino acids, while allowing higher diversity in non-interface regions. This pattern reflects the "function-structure" trade-off spontaneously formed by the model during the optimization process.

Validation of the Model's One-shot Generation Capability

We tested the model’s one-shot capability: using the trained model (after 135 steps) to perform only a single generation, producing a small number of sequences (e.g., 8) and evaluating their quality. The results (Figure 1e) show that these sequences performed excellently across multiple metrics (Total Reward, ipTM, pTM). All sequences achieved reward values exceeding 0.9 and ipTM scores above 0.88, indicating they possess high-confidence binding capability. This result proves that the reinforcement learning-optimized ProteinMPNN holds the potential for "design once, verify multiple times," which can significantly reduce the high-throughput screening scale required by classical methods.

Figure 1a illustrates the changes in KL divergence, reward values, and related metrics during the training process, demonstrating that the model exhibits the expected training trends and learning outcomes.

Figure 1b displays the mutation convergence trend throughout the training process: the upper graph shows the variation in the average mutation rate with training steps, while the lower graph illustrates the trend in sequence diversity, i.e., the degree of difference among sequences generated at the same training step. The decline in both curves indicates normal convergence of the model as training progresses.

Figure 1c presents the mutation rate at each position.

Figure 1d presents the amino acid distribution at each position in the model-generated sequences through a sequence logo. The stacking height of the letters reflects the frequency of occurrence of each amino acid at that position.

Figure 1e presents the 8 sequences obtained from a single round of generation using the trained model, along with their corresponding rewards and various metrics, illustrating the model's acquired one-shot generation capability to produce high-scoring sequences in a single attempt.

Assessment of Backbone Optimization Potential

We systematically tested a large number of backbones generated by RFdiffusion using the BetterMPNN framework, aiming to evaluate the performance upper limit achievable by different backbones after the optimization cycle and establish standards for early judgment of backbone potential.

Typical Training Trajectories of High, Medium, and Low Potential Backbones

We selected three representative sets of backbones (Figures 2a-2b), labeled as having high, medium, and low optimization potential respectively:

High-potential backbones (e.g., Figures 2a-1, 2b-1): High-reward sequences (reward > 0.7) appeared early in training (first 30 steps), the reward overall showed an upward trend, and sequence diversity converged reasonably in the later stages. Correspondingly, their ipTM and pTM distributions also significantly skewed towards high-performance regions.
Medium-potential backbones (e.g., Figures 2a-2, 2b-2): Reward improvement was slow in the early training phase; although higher rewards occasionally appeared, the overall distribution was relatively scattered, and the model failed to stably converge to a single high-performance pattern.
Low-potential backbones (e.g., Figures 2a-3, 2b-3): Rewards remained low throughout training, sequence diversity decreased slowly, and the model failed to effectively learn any high-performance binding pattern.

Feasibility Analysis of Early-Stage Backbone Potential Assessment

Multi-dimensional analysis of the early-stage outcome data from the backbones shown in Figure 2a (Figures 2c, 2d) indicates that high-potential backbones demonstrate a more efficient exploratory capability early in training, enabling them to rapidly identify and lock onto high-reward regions. Comparative analysis of statistical distributions (Figure 2c) and performance thresholds (Figure 2d) reveals that high-potential backbones significantly outperform their medium and low-potential counterparts across key metrics—including total reward, ipTM, and pTM—from the initial training stages. Furthermore, they generate a greater number of high-quality solutions exceeding the defined threshold. This phenomenon is highly reproducible, providing a reliable quantitative basis for the rapid screening of backbones during the early training phase (typically the first 30-40 steps). For medium and low-potential backbones, even extending the training duration to 90 steps (e.g., Figures 2b-2, 2b-3) does not yield significant performance improvement. This further corroborates that early-stage performance can serve as a reliable indicator of their optimization potential ceiling.

Figure 2a proposes an early-stage screening strategy by comparing the initial training performance of backbone networks, categorizing them into high, medium, and low optimization potential. Figure 2b then validates this strategy in other independent, extended training tasks, presenting the full trajectories of the three backbone types to verify the accuracy of the early assessment.

Figures 2a-1 and 2b-1 exemplify high-potential backbones, characterized by the emergence of high-reward sequences early in training, a clear learning trend toward generating higher total rewards, and no signs of premature convergence—indicating a high optimization ceiling. In practice, such backbones are typically retained for further in-depth training.

Figures 2a-2 and 2b-2 represent medium-potential backbones. Although some high-reward outcomes appear in the early stages, they are scarce and lack a consistent learning trend. Figures 2a-3 and 2b-3 correspond to low-potential backbones, which scarcely produce any high-reward sequences or exhibit learning trends during early training, often showing large reward fluctuations or even temporary declines.

Notably, Figures 2b-2 and 2b-3 display extended training steps yet show no significant improvement in later phases. This confirms that backbones performing poorly early on are unlikely to recover, indicating that their initial architectures can be considered inferior and safely discarded. Such early screening enhances training efficiency and conserves computational resources.

Figures 2c and 2d present a multi-dimensional analysis of the early-stage outcome data from the respective tasks of the high, medium, and low-potential backbones initially shown in Figure 2a, thereby validating the feasibility of screening backbone potential based on early training performance. These charts systematically evaluate the optimization characteristics of backbones with different potential levels from two perspectives: statistical distribution and performance thresholds, providing a quantitative basis for the early elimination of low-quality backbones.

Figure 2c provides statistical validation for early assessment through bilateral half-violin plots. The left panel compares the distribution densities of high-potential vs. medium-potential backbones across three key metrics: total reward, ipTM, and pTM; the right panel compares high-potential vs. low-potential backbones. The three dashed lines within each violin represent the 25th, 50th (median), and 75th percentiles, comprehensively displaying the statistical distribution characteristics of the data. Median values are also annotated, clearly showing the significant advantage of high-potential backbones across all metrics.

Figure 2d presents threshold-based analysis of backbone optimization potential, including three subplots for total reward, ipTM, and pTM. Each subplot amplifies performance differences through exponential transformation and highlights exceptional performance points exceeding the threshold (0.7). High-potential backbones demonstrate more high-quality solutions across all three metrics, confirming their superior optimization potential.

Protein Design

Design of Active Pocket Inhibitors

Within 20 hours, we used BetterMPNN to design a class of protein inhibitors that directly insert into the active pocket of GZMK (Figure 3a). These inhibitors form a short peptide-like structure at the C-terminus, whose key residues (arginine) insert into the catalytic pocket of GZMK in a manner similar to natural substrates. And because the terminus is valine (Val), it theoretically avoids cleavage, thereby achieving stable binding. Dry-lab evaluation showed that this class of inhibitors generally had ipTM values above 0.8 and PAE reward close to 1.0, indicating highly reliable binding interfaces.

Notably, although the model did not receive any prior knowledge about GZMK's natural substrates or active sites during training, it spontaneously converged to a structural pattern similar to natural protease-inhibitor interactions. This computationally validates the strong potential of our method for function-oriented design.

Figure 3a presents the structure of a protein inhibitor designed de novo within 20 hours, which directly targets the active site pocket of GZMK. The binding interface mimics natural inhibitor proteins by inserting a key arginine (Arg) residue into the catalytic pocket. The valine (Val) at the terminus confers theoretical resistance to cleavage.

Note: ipTM is a scalar in the range 0-1 indicating predicted interface TM-score (confidence in the predicted interfaces) for all interfaces in the structure. In actual experiments and research, it is generally considered that a high ipTM implies the possibility of having affinity.

Figure 3a-1 and 3a-2 were rendered using PyMOL. Figure 3a-3 is a screenshot of prediction results from https://alphafoldserver.com/. Figures b, c, d and e were generated using the same method.

Design of Multi-mode Binding Proteins

In addition to inhibitors, we successfully designed GZMK binding proteins with various binding modes (Figures 3b-3e):

Terminal insertion mode: Besides the aforementioned inhibitors, another class of binding proteins also employs a terminal insertion mechanism (Figure 3b).
Mid-segment insertion mode: Some binding proteins insert their mid-segment loops into the active pocket or other surface crevices (Figures 3c-3d).
Surface-binding mode: Another class of binding proteins binds to the GZMK surface through larger, flat interfaces (Figure 3e).

In dry-lab evaluations, the majority of binding proteins designed by BetterMPNN outperformed control sequences generated by the classical pipeline (RFdiffusion + ProteinMPNN) in key metrics like ipTM and PAE, indicating a clear advantage of our method in interface optimization.

Figure 3b displays another variant of the terminal-inserted binding mode, highlighting the diversity of designs achievable with our method.

Figure 3c and 3d present two variants of the mid-segment insertion binding mode, showcasing different loop conformations interacting with the active pocket.

Figure 3e illustrates a surface-binding mode, characterized by a broad, flat interface.

In Silico Indicators

The dry experiment evaluation results of the proteins we designed are as follows:

ipTM: A scalar in the range 0-1 indicating predicted interface TM-score (confidence in the predicted interfaces) for all interfaces in the structure. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction.
pTM: A scalar in the range 0-1 indicating the predicted TM-score for the full structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure.
has_clash: A boolean indicating if the structure has a significant number of clashing atoms (more than 50% of a chain, or a chain with more than 100 clashing atoms).
DockQ: A comprehensive indicator used to measure the similarity between the docking model and the native structure. The value range of this score is from 0 to 1. The higher the score, the better the fit between the model and the native structure, and the better the quality of the model.
fnat: Fraction of retrieved native contacts (same as Recall or TPR). The higher the fnat value is, the more natural interactions the model can capture.
∆G_binding: The free energy change of multiple chains forming a complex, in the unit kcal/mol. The more negative ∆G_binding , the higher affinity between proteins.

These results indicate that we have obtained high-affinity designed proteins verified by computaional exiperiment.

Subsequent Wet-lab Validation

We are performing wet-lab validation and continue optimizing our project. The experimental results and subsequent optimizations will be updated on the project's GitHub.

BetterEvoDiff

Training Effectiveness and Sequence Optimization

BetterEvoDiff, as an optimization tool based on multi-site synergistic mutations, demonstrated good optimization capability in the dry-lab environment (Figures 4a). During training, the probability of the model generating high-reward sequences (reward score > 0.8) steadily increased with the number of training steps, indicating its ability to gradually learn strategies for improving sequence affinity through synergistic mutations.

Mutation analysis (Figures 4b-4d) showed that BetterEvoDiff could simultaneously alter multiple sites during optimization and form conserved mutation patterns in key functional regions. The decrease in sequence diversity and the convergence of mutation rates (Figure 4b) indicated stabilization of the optimization; the mutation heatmap (Figure 4c) and position-specific mutation rates (Figure 4d) collectively revealed variable and conserved regions within the sequences, together demonstrating its ability to capture epistatic effects between residues.

Validation of the Model's One-shot Generation Capability

We directly called the trained model for one round of generation, and the reward values of the three obtained sequences were 0.9038, 0.8828, and 0.8645. This indicates that our model has acquired the one-shot capability after training, meaning it can generate sequences with excellent scores in a single attempt.

Challenges in Wet-lab Validation

Although BetterEvoDiff successfully optimized a batch of binding proteins with a starting affinity of 7 µM into variants with higher rewards in dry-lab experiments, the wet-lab validation results were disappointing: the highest affinity of the optimized proteins was only 15 µM, lower than the starting protein. We analyze that this may be caused by the following factors:

Misalignment between dry and wet-lab rewards: Even using AlphaFold as the reward function, a gap remains between its predicted interface quality and real affinity.
Deterministic role of the backbone: The backbone of the starting protein may have been near its affinity ceiling, making breakthrough improvement through sequence mutations alone difficult.

This result again highlights the importance of the "backbone-first" principle in protein design and suggests that future work should focus on further optimizing the reward function or incorporating more experimental data for calibration.

Figure 4a shows the model's training progress, with both the reward values and the probability of generating high-scoring sequences increasing over time. The top panel displays the reward score trends during the first 550 training steps, including the average reward curve (blue solid line) and the standard deviation range (light blue shading); the lower figure shows the probability of generating high-reward sequences (reward score > 0.8) across training steps, calculated using a sliding window method (window size: ±20 steps).

Figure 4b depicts the model's convergence trends, where both the mutation rate and sequence diversity decrease as training progresses, indicating stable model optimization.

Figure 4c presents a mutation frequency heatmap, revealing conserved and variable regions in the generated sequences. The blue boxes indicate the initial amino acids of the reference sequence, and darker shades represent a higher mutation frequency to the corresponding amino acid at that position.

Figure 4d shows the mutation rate at each position.

Summary and Outlook

BetterMPNN and BetterEvoDiff, as the core tools developed in our project, demonstrate significant advantages and clear application prospects in protein design tasks:

BetterMPNN exhibits near one-shot efficient design capability in dry-lab environments, widely producing inhibitors with high-affinity characteristics and proteins with diverse binding modes. Through dynamic assessment early in training, we can rapidly evaluate the optimization potential of RFdiffusion-generated backbones, significantly enhancing the resource efficiency of the overall design pipeline. Furthermore, BetterMPNN is not only suitable for de novo design but can also be directly applied to targeted optimization tasks for existing proteins, demonstrating strong versatility. However, its current main limitation lies in the misalignment between dry-lab reward signals (based on AlphaFold evaluation metrics) and wet-lab results. Future work needs to further optimize the reward function to enhance its correlation with actual affinity.

BetterEvoDiff demonstrates excellent sequence optimization capability and one-shot potential in dry-lab environments, enabling systematic exploration of the sequence space through multi-site cooperative mutations. However, its optimization efficacy faces challenges in wet-lab validation. For example, when we performed directed evolution based on multi-site mutations on a binding protein with a starting affinity of 7 µM, dry-lab experiments generated a batch of variants with higher rewards than the starting point, yet wet-lab validation showed that the highest affinity among these proteins was only 15 µM, even lower than the starting point. Potential reasons include: first, the starting protein may already possess relatively high affinity, and while AlphaFold evaluation (e.g., ipTM > 0.8) can indicate binding likelihood, it cannot accurately quantify binding strength; second, in current protein design workflows, the backbone plays a more decisive role in affinity than the sequence itself, making it difficult to break through the inherent performance ceiling of the backbone through sequence mutations alone.

Looking ahead, we will focus on the following directions:

Optimizing the reward function design by incorporating more physical features or experimental feedback data to enhance the consistency between dry and wet-lab results.
Exploring mechanisms for jointly optimizing both side chains and backbones by broadcasting reward signals to both, enabling more comprehensive protein design.
Continuously accumulating wet-lab validation data for model calibration and iteration.

We believe that with continuous algorithmic evolution and the enrichment of validation data, this framework holds promise for achieving broader and more precise applications in the field of protein design.

Usage

Our project provides two protein design tools, BetterMPNN and BetterEvoDiff, suitable for different design scenarios and requirements. We strongly recommend that users carefully read the detailed usage instructions for each tool before use. In the project's GitLab repository, each tool comes with a complete README.md file containing detailed environment configuration, parameter descriptions, troubleshooting, and best practices.

BetterMPNN performs sequence design based on structure, requiring input of a protein backbone (in PDB format) and optimizing its amino acid sequence; BetterEvoDiff directly optimizes in sequence space, enhancing protein properties through multi-site synergistic mutations. Users can choose the appropriate tool based on their experimental conditions and design objectives.

BetterMPNN is particularly suitable for de novo design tasks. When you have obtained a protein backbone through tools like RFdiffusion and wish to design sequences that can fold into this backbone with high affinity, this tool is the optimal choice. It optimizes ProteinMPNN through reinforcement learning, enabling it to learn to generate sequences with excellent binding interfaces while maintaining backbone compatibility.
BetterEvoDiff is more suitable for optimizing existing proteins. If you already have a protein sequence (e.g., a binding protein preliminarily validated through experiments) and wish to further optimize its affinity or other properties, this tool uses a discrete autoregressive model for multi-site synergistic mutations, considering amino acid interactions to achieve directed evolution in sequence space.

Environment Configuration and Installation

BetterMPNN Environment Configuration

First, obtain the project code from the GitLab repository:

cd BetterMPNN
conda env create -f environment.yml
conda activate protein-mpnn

Pre-trained ProteinMPNN weight files need to be downloaded, including multiple versions such as v_48_002.pt, v_48_010.pt, placed in the vanilla_model_weights and soluble_model_weights directories, respectively. Specific methods for obtaining the weights and file structure descriptions can be found in the detailed documentation for the BetterMPNN section on GitLab.

BetterEvoDiff Environment Configuration

Enter the corresponding directory and set up the EvoDiff environment:

cd BetterEvoDiff
conda env create -f environment.yml
conda activate evodiff

EvoDiff code needs to be cloned from the official Microsoft repository and installed from source. Detailed installation commands and verification steps can be found in the README documentation for the BetterEvoDiff section on GitLab.

AlphaFold Environment Configuration

Both tools rely on AlphaFold3 for structural prediction evaluation. It is necessary to follow the official procedure to obtain AlphaFold3 model parameter authorization, download genetic databases, and configure the Singularity container environment. Special attention is required: AlphaFold3 model parameters must be obtained through Google DeepMind's official application process. Specific configuration steps and precautions are detailed in the respective documentation for each tool on GitLab.

Tool Usage Details

BetterMPNN Usage Guide

BetterMPNN employs a structure-based sequence design strategy. The complete operational process is as follows:

Input Preparation: Users need to prepare a PDB format protein backbone file as the design starting point. Place the PDB file in the input directory and configure the correct file path in scripts/run_bettermpnn.py.

Configuration Modification: By modifying the configuration parameters in scripts/run_bettermpnn.py, key parameters such as the protein chain to design, number of training steps, and number of variants generated per step can be specified. The main configurable parameters include:

TRAINING_STEPS: Total number of training steps, default 3000 steps (In practice, far fewer steps are needed. For our project, only 140 training steps yielded good results.)
NUM_GENERATIONS: Number of variants generated per step, default 8
CHAIN_ID_TO_DESIGN: Identifier for the protein chain to be designed
Reward function weight parameters, etc.

Run Command:

cd BetterMPNN
python scripts/run_bettermpnn.py

The tool automatically executes the complete optimization cycle: using ProteinMPNN to generate sequence variants, predicting structures via AlphaFold3 and calculating rewards, and finally updating the model parameters using the GRPO algorithm. The entire process iterates continuously until convergence.

Result Analysis: After training, model checkpoints are saved in the rl_checkpoint directory, rl_rewards_log.csv records detailed reward metrics, and the rl_prediction directory contains structural prediction results for all variants. Users can test the trained model using the scripts/test-model-1.py script.

BetterEvoDiff Usage Guide

BetterEvoDiff optimizes directly in sequence space and does not require structural input. The usage process is as follows:

Input Preparation: Set the initial protein sequence to be optimized in scripts/run_betterevodiff.py. The sequence should be provided as a string to the BASE_SEQUENCE variable.

Parameter Configuration: Key parameters include:

TRAINING_STEPS: Total number of training steps
NUM_GENERATIONS: Number of variants generated per step
NUM_MUTATIONS: Number of positions to mutate each time
NUM_PATHS_PER_VARIANT: Number of paths per variant (multi-path strategy)

Run Command:

cd BetterEvoDiff
python scripts/run_betterevodiff.py

This tool employs a multi-site synergistic mutation strategy, generating sequences by randomly masking sequence positions and using EvoDiff for denoising, similarly optimized through reinforcement learning based on AlphaFold3 evaluation results.

Result Output: During training, model checkpoints are periodically saved, and reward scores and generated sequences are recorded. Users can use the scripts/test-model-2.py script to perform test generation with the trained model.

Advanced Configuration and Optimization

Reward Function Customization

Both tools support user-defined reward functions. In scripts/reward_utils.py, the weight coefficients of structural metrics such as pTM, ipTM, PAE can be adjusted, or new evaluation dimensions can be added. Detailed parameter descriptions and adjustment suggestions can be found in the README documents for each tool on GitLab.

MSA Processing Optimization

For small binder design tasks, we found that skipping the MSA computation for the binder protein while retaining the MSA information for the target protein can significantly reduce computational costs without affecting prediction accuracy. Users can configure this in two ways:

Skip MSA Computation Mode (Recommended): Configure precomputed MSA data in the config/test.json file and set the AlphaFold3 run parameter to --run_data_pipeline=false.

Standard MSA Computation Mode: Let AlphaFold3 compute the MSA automatically. This significantly increases computation time but is suitable when precomputed MSA is unavailable.

Specific configuration methods and performance comparison data can be found in the detailed documentation on GitLab.

Training Parameter Tuning

Users can adjust multiple key parameters based on computational resources and task requirements. For users with limited computational resources, it is recommended to reduce the NUM_GENERATIONS parameter to lower the computational burden per step. Other important parameters include:

BETA: KL divergence weight, controlling the magnitude of model updates
LEARNING_RATE: Learning rate, affecting convergence speed
SAMPLING_TEMPERATURE: Sampling temperature, controlling generation diversity

Practical Application Scenarios

This tool is particularly suitable for the following protein design tasks:

One-shot Protein Design: Combine RFdiffusion to generate protein backbones, use our tool to quickly evaluate the optimization potential of each backbone, concentrate resources on in-depth optimization of high-potential backbones, and achieve efficient high-quality protein design.

Protein Binding Optimization: De novo design of binding proteins targeting specific sites, optimization of binding affinity for existing proteins, exploration of multi-mode binding proteins.

Multi-site Mutation Optimization: Coordinated optimization of multiple mutation sites, considering amino acid interactions, to achieve directed evolution of protein function.

Precautions and Best Practices

Before using this tool, ensure that you have legally obtained the AlphaFold3 model parameters and comply with the corresponding usage licenses. Due to the complexity of protein design tasks, we recommend that users iterate over multiple design cycles and combine experimental verification to gradually optimize design results.

For users with limited computational resources, we recommend starting with small-scale tasks and gradually increasing complexity. Making full use of the early backbone evaluation function we provide can significantly improve resource utilization efficiency. During training, closely monitor the trends of metrics such as KL divergence and reward value, and promptly adjust parameters or terminate non-converging training tasks.

For specific troubleshooting, performance optimization techniques, and best practice cases, please be sure to refer to the detailed README.md documents for each tool in the GitLab repository. If you encounter technical problems during use, first check the detailed documentation on GitLab, as most common issues have corresponding solutions. If the problem remains unresolved, please submit an issue via GitLab, and we will provide technical support as soon as possible.

Contribution

Our software project not only addresses specific challenges in protein design but also provides a set of open, reproducible, and generalizable computational tools for the broader iGEM and synthetic biology communities. By integrating advanced machine learning methods with practical biological design workflows, we aim to lower the barrier to high-quality protein design and enable more teams to pursue functional protein engineering with greater efficiency and precision.

We have developed four core tools and workflows, each designed to be modular, well-documented, and readily applicable beyond our own project:

BetterMPNN: A reinforcement learning-enhanced protein sequence design tool that enables near one-shot generation of high-affinity binders from structural backbones.
BetterEvoDiff: A multi-site mutation-based optimization tool for directed evolution of existing protein sequences.
Analysis Pipeline (PPI-APP): A physics-guided workflow for reliable affinity prediction of de novo designed protein complexes.
Virtual Screening Workflow: An integrated computational pipeline for discovering small-molecule inhibitors against flexible targets.

These tools are not limited to our target protein GZMK. They can be adapted by other iGEM teams and researchers for a wide range of applications, including:

Designing protein inhibitors or biosensors
Optimizing enzyme activity or binding affinity
Predicting protein-protein interaction strength
Screening small-molecule libraries against flexible binding sites
Many other tasks……

Notably, BetterMPNN and BetterEvoDiff have demonstrated strong performance in dry-lab settings: both models exhibit expected training convergence, clear one-shot generation capability, and high scores in dry-lab validation. These results validate our core hypothesis—that integrating GRPO reinforcement learning with protein generative models can enable efficient exploration and learning of complex functional properties. This approach represents a novel paradigm beyond classical hallucination-based or sampling-and-screening methods, offering a more targeted and resource-efficient path to protein design.

We believe our tools and the underlying methodology can inspire future iGEM projects and contribute to the broader field of protein engineering. By sharing fully documented code, detailed usage guides, and modular workflows, we hope to empower others to build upon our work and accelerate the development of functional proteins for diverse biological applications.

References

1.ProteinMPNN: J. Dauparas, et al. "Robust deep learning-based protein sequence design using ProteinMPNN." Science (2022).

2.AlphaFold 3: Josh Abramson, et al. "Accurate structure prediction of biomolecular interactions with AlphaFold 3." Nature (2024).

3.DeepSeek-R1: DeepSeek-AI. "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." arXiv (2025).

4.DeepSeek-Math: Zhihong Shao, et al. "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." arXiv (2024).

5.EvoDiff: Sarah Alamdari, et al. "Protein generation with evolutionary diffusion: sequence is all you need." bioRxiv (2024).

6.Glide SP: Friesner RA, et al. "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy." J. Med. Chem. (2004).

7.IFD (Induced Fit Docking): Sherman W, et al. "Novel Procedure for Modeling Ligand/Receptor Induced Fit Effects." J. Med. Chem. (2006).

8.Desmond (MD Simulation): Bowers KJ, et al. "Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters." Proceedings of the ACM/IEEE Conference on Supercomputing (SC06) (2006).

9.SiteMap: Halgren TA. "Identifying and Characterizing Binding Sites and Assessing Druggability." J. Chem. Inf. Model. (2009).

10.Prime MM-GBSA: Rapp C, Carlson HA. "Relative Binding Affinities in Congeneric Series." J. Chem. Inf. Model. (2011).

11.GeminiMol: Wang L, et al. "Conformational Space Profiling Enhances Generic Molecular Representation for AI-Powered Ligand-Based Drug Discovery." Advanced Science (2024).

12.RFdiffusion: Watson JL, et al. "De novo design of protein structure and function with RFdiffusion." Nature (2023).

13.AlphaFold2: Jumper J, et al. "Highly accurate protein structure prediction with AlphaFold." Nature (2021).

14.AlphaFold Server / AlphaFold DB: Varadi M, et al. "AlphaFold Protein Structure Database in 2024." Nucleic Acids Res. (2024).

15.Pyrosetta / Rosetta Scoring (Binder Design Scoring): Alford RF, et al. "The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design." J. Chem. Theory Comput. (2017).

Repository & Support

All the latest documentation updates, version releases, and technical support will be conducted in the main GitLab repository: https://gitlab.igem.org/2025/software-tools/shanghaitech-china
Our project will also be open-sourced on GitHub synchronously. Since GitLab will be frozen from the deadline until the end of the competition, the updates of verification results and project optimizations during this period will be preferentially synchronized on GitHub: https://github.com/Terry-Wang-Lynx/RL-is-All-You-Need

Show H-bonds Show binder surface Show GZMK surface Show interface Hide binder Hide target