Results

Diagnosis of Rhinosinusitis

Workflow of Rhinosinusitis Diagnostic Test Strip

Preface

In the following sections, we summarize the key experimental findings from the diagnostic module. This part focuses on the development of a rapid GZMK-based detection system using de novo designed binders and colloidal gold test strips. Detailed experimental procedures are documented in Notebook, while optimization cycles are presented in Engineering. This structure highlights the essential results of binder design, expression, and detection, while allowing readers to trace the complete workflow when needed.

Principle and Summary

Chronic rhinosinusitis (CRS) is a prevalent inflammatory disease in which patients often suffer from persistent nasal obstruction, purulent discharge, and reduced olfaction, with approximately one-quarter developing comorbid nasal polyps. Due to its complex etiology and high recurrence rate, conventional diagnostic approaches—such as imaging or symptom scoring—usually provide information only at advanced stages, lacking sensitive and convenient early diagnostic methods. Recent studies have identified GZMK (Granzyme K), a serine protease secreted by CD8+ T cells, as a key factor in nasal inflammation. Receiver operating characteristic (ROC) curve analysis demonstrates that the AUC value for GZMK detection (0.667) is significantly higher than that of eosinophil counts (0.556) and IL-5 levels (0.504), highlighting its potential as a predictive biomarker for CRS and nasal polyps.

The aim of this project was to develop a rapid GZMK-based diagnostic tool to enable early detection and disease monitoring of CRS. To achieve this, we employed the principle of lateral flow immunochromatography and, innovatively, replaced traditional antibodies with de novo designed binding proteins (binders), establishing a specific “Binder1–GZMK–Binder2” recognition model. This design allows for a simple, rapid molecular diagnostic workflow without the need for sophisticated instrumentation.

The experimental workflow comprised three major stages. First, in the binder design stage, approximately 130,000 candidate sequences targeting GZMK hotspot regions were generated, and several dozen promising designs were selected. Second, during expression and affinity validation, 47 binders were expressed in Escherichia coli, with 37 purified to high quality. Among these, 8 exhibited measurable binding affinity, and the best-performing binder achieved a Kd of 0.403 μM as determined by SPR. Finally, in the colloidal gold test strip construction stage, high-affinity binders were conjugated to gold nanoparticles, while Aprotinin served as the capture reagent on the test line and SUMO1/Ubc9 as the control system, enabling optimized strip assembly.

The results demonstrated that in the presence of GZMK, the test strip produced clear red bands on both the test line and control line within approximately 5 minutes, in sharp contrast to the negative control. These findings confirm not only the wet-lab effectiveness of de novo designed binders but also their feasibility as antibody substitutes in colloidal gold lateral flow assays, providing a novel solution for the early diagnosis of rhinosinusitis.

Experimental Workflow for Rhinosinusitis Diagnosis

Binder Design

Introduction

Our ultimate goal is to develop a colloidal gold-based test strip for the rapid detection of Granzyme K (GZMK). Traditional methods rely on specific antibodies, the development of which is time-consuming, costly, and resource-intensive, placing it beyond the scope of our project. Therefore, we turned to the cutting-edge field of computational protein design to create a de novo protein binder that can specifically bind to GZMK, serving as a functional substitute for antibodies in the assay.

Aim

To meet the requirements for a future sandwich assay, the objective of this phase is to computationally design non-competing, high-affinity protein binders that target multiple distinct hotspots on the surface of GZMK. Given that the success rate of de novo computational design is generally low, our strategy is to generate a vast library containing tens of thousands of protein sequences and structures. From this pool, we will apply a rigorous filtering process to select a few dozen of the most promising designs for subsequent wet-lab expression and validation.

Result

We generated approximately 130,000 initial binder designs targeting various hotspots on the GZMK surface. From this extensive pool, we selected a few dozen candidates that simultaneously satisfied a stringent set of computational metrics: a predicted Aligned Error for the interaction domain below 10 Å (pAE_interaction < 10), favorable energy and shape complementarity (Buns_heavy_ball_1.1D ≤ 1), a strong predicted binding free energy (ΔΔG < -30.0 kcal/mol), and a high binding efficiency per unit of surface area (ΔΔG per SASA < -0.09 kcal/mol/Å²).

A “nine-grid” distribution plot of the evaluation metric

Nine-grid distribution plot of the evaluation metric
A predicted complex structure of a selected candidate binder (blue) bound to the target protein GZMK (red), showing a tightly packed interaction interface between the two.

Predicted complex structure of binder (blue) bound to GZMK (red)

Discussion

In this process, I developed an semi-automated workflow by writing batch-processing scripts for repetitive tasks such as sequence design (ProteinMPNN), structure relaxation, and multi-metric scoring. This significantly enhanced our computational throughput. However, the process remains computationally intensive. We also observed that the initial choice of hotspots critically influences the success rate, as some selected target sites failed to yield any viable binders, indicating a key area for methodological improvement.

Outlook

Looking ahead, we plan to incorporate AI-driven methods for more accurate hotspot prediction, which could substantially increase the design success rate. Furthermore, pending time and funding, we aim to perform a second round of in silico affinity maturation on the most promising binders. This could potentially enhance their binding affinity by one to two orders of magnitude, making them competitive with or even superior to natural binder for our diagnostic application.

Binder Expression and Affinity Assay

Introduction

According to literature data, most binders generated by de novo protein design fail to exhibit detectable affinity in wet-lab experiments. Thus, it is crucial to express and purify the designed sequences, followed by affinity assays, to identify those with practical binding capacity. This step validates computational predictions and supplies essential reagents for subsequent test strip development.

Aim

To express and purify approximately 50 binders and perform affinity assays, determining precise Kd values for binders exhibiting affinity.

Result

We expressed binders using Escherichia coli. Due to the large number of proteins, we innovated the protein purification method to maximize throughput. Ultimately, we expressed 47 proteins, each purified through a two-step process involving affinity chromatography and size-exclusion chromatography. Ten proteins failed to yield high-purity solutions due to inclusion body formation or degradation.

SPR assays of the 37 purified binders revealed 8 with measurable binding affinity. Among them, binders 1–6 and 1–24 demonstrated the most promising performance, with fitted Kd values of 6.863 μM and 0.4033 μM, respectively. These results highlight their potential as strong candidates for downstream applications.

Binder expression and affinity assay results

Binder_Results_CSV.csv

Number	Serial Number	Molecular Weight/Da (Befire/After Enzyme Digestion)	Absorbance	Cryopreservation Concentration/μM	Affinity with GZMK	Kd Value/μM
1-1	1-1070	3008.72	0.34	650.11	No	—
1-2	1-131	8185.8	0.36	—	—	—
1-3	1-39	8779.79	0.51	21.64	No	—
1-4	1-1947	8440.19	1.24	26.07	No	—
1-5	1-559	8307.29	0	—	—	—
1-6	1-1883	8759.8	0.17	244.3	Yes	6.863
1-7	1-1824	8784.42	1.19	68.3	No	—
1-8	1-1875	8101.9	0.18	—	—	—
1-9	1-1855	8699.66	0.97	77	No	—
1-10	1-1296	10294.26	1.01	84.6	No	—
1-11	1-1094	8838.51	1.35	84.7	No	—
1-12	1-1004	8667.49	0.52	103.61	No	—
1-13	1-1728B	8040.77	0.37	46.76	No	—
1-14	1-1983	10130.96	1.18	76	No	—
1-15	1-1728A	7934.69	0.19	54.44	No	—
1-16	1-461	9018.93	1.44	68	No	—
1-17	1-1697	8811.4	1.69	81.49	No	—
1-18	1-293	8703.56	0.51	85.6	No	—
1-19	1-1632	8706.61	0.68	70.6	No	—
1-20	1-507	9993.09	0.6	32.72	No	—
1-21	1-723	8916.79	1.12	—	—	—
1-22	1-1050	9051.75	1.48	39.4	No	—
1-23	1-1277	8864.81	1.18	159.06	No	—
1-24	1-1320	9176.01	1.46	156.6	Yes	0.4033
1-25	1-345	8678.52	0.69	42.52	No	—
1-26	1-199	10038.9	0.74	14.84	No	—
1-27	1-1031	9146.05	2.29	30.94	No	—
1-28	1-1962	9498.51	0.78	12.1	No	—
1-29	1-1180	8725.56	0.68	12.61	No	—
1-30	1-612	8726.55	2.23	6.3	No	—
1-31	1-183	8207.24	1.58	13.77	No	—
1-32	1-738	9137.13	0.65	—	—	—
2-1	4-7	8698.8/6816.74	0.17/0.22	66.01	Yes	—
2-2	132-0	8721.88/6839.82	0.17/0.22	194.01	No	—
2-3	98-6	8768.99/6886.93	0.17/0.22	232.98	No	—
2-4	124-4	8731.92/6849.86	0.17/0.22	—	—	—
2-5	46-7	8714.8/6832.74	0.17/0.22	75.71	Yes	—
2-6	179-2	8657.86/6775.8	0.17/0.22	20.07	Yes	14.76
2-7	173-7	8634.45/6752.39	0.17/0.22	—	—	—
2-8	58-6	8569.68/6687.63	0.17/0.22	76.71	Yes	—
2-9	76-2	8699.74/6817.69	0.17/0.22	—	—	—
3-1	aprotinin-1337	11995.06/10113	1.24/1.47	5.043	No	—
3-2	bikunin-1170	10442.37/8560.32	1.77/2.16	112.15	Yes	99.61
3-3	bikunin-2010	9388.32/7506.26	2.44/3.05	4.93	No	—
3-4	bikunin-3264	9030.83/7148.78	2.32/2.93	—	—	—
3-5	bikunin-369	11220.33/9338.27	2.53/3.04	—	—	—
3-6	hs11-1088	8865.76/6783.70	0.17/0.22	125.3	Yes	—

SPR binding results of Binder 1-6 and 1-24

Discussion

By continuously optimizing expression and purification protocols, we achieved our primary objective of obtaining sufficient binders and identifying those with strong binding affinity. The best-performing binder reached submicromolar affinity (0.403 μM), demonstrating the feasibility of computational design in experimental settings. However, limitations remain, including insolubility in ~20% of constructs and low expression yields, which constrained replicates in affinity measurements.

Outlook

In the future, we aim to further optimize the workflow by adopting smaller-scale systems to increase speed and parallelization. Additionally, we anticipate integrating commercially available high-throughput protein expression and detection platforms to fully realize the potential of de novo designed binders.

Colloidal Gold Test Strips

Introduction

To establish a rapid and user-friendly diagnostic method for rhinosinusitis, we developed colloidal gold test strips based on the principle of lateral flow immunochromatography. Gold nanoparticles conjugated to specific binders form complexes with GZMK in the sample. These complexes migrate via capillary action along the strip and are captured at the test line (T line) by another immobilized binder, producing a visible red band that indicates a positive result. A control line (C line) confirms the validity of the strip using a positive control system. This method enables rapid diagnosis within minutes, without the need for additional equipment.

Aim

To construct colloidal gold test strips employing two validated binders capable of binding GZMK, enabling reliable detection within 5 minutes and demonstrating feasibility for early rhinosinusitis diagnosis.

Result

We innovated on the conventional "target protein–antibody–antibody" mode by adopting a "Binder1–GZMK–Binder2" strategy, aligning with the workflow of de novo protein design. Binder 1–6 was conjugated to gold nanoparticles, while Aprotinin was immobilized on the T line. The C line incorporated SUMO1-gold conjugates and immobilized Ubc9 as the positive control.

Optimization experiments established the best conjugation conditions (pH and protein dosage), after which conjugated gold nanoparticles were purified and assembled onto the conjugate pad. Proteins were coated onto the NC membrane, and the strips were manually cut and assembled. Upon applying GZMK-containing samples, results were visible within 5 minutes, with clear double bands (T and C lines), in sharp contrast to the negative control, thereby confirming the validity and sensitivity of the test strip.

Design and Validation of the GZMK Diagnostic Test Strip

Discussion

The final results demonstrated proper functionality of the test strip and reliable indication of GZMK presence, achieving the intended goal. Nonetheless, limitations were observed: the bands produced were not fully uniform, likely due to manual pipetting instead of automated spraying, and manual assembly lacked precision, which impacted the aesthetic quality and reproducibility of the strips.

Outlook

With professional fabrication equipment such as automated sprayers and precision cutters, the aesthetic quality and reproducibility of the strips could be significantly improved. In the future, higher-affinity and more specific binders could replace current materials to enhance sensitivity and specificity. This platform also holds potential for multiplex detection, broadening its application in both clinical diagnostics and home-based early screening.

Treatment of Rhinosinusitis

Small-Molecule Inhibition of GZMK Activity

Preface

In the following sections, we summarize the main experimental results from the therapeutic module. This part focuses on the expression of active GZMK, establishment of the enzyme activity assay, and high-throughput screening of inhibitors. Detailed procedures are provided in Notebook, and iterative optimizations are described in Engineering. This arrangement allows us to present key validated outcomes clearly while preserving traceability of all experimental details.

Principle and Summary

Inhibiting GZMK activity has been identified as a potential therapeutic strategy for rhinosinusitis. GZMK, a serine protease secreted by memory CD8+ T cells, can cleave complement components C2, C3, C4, and C5, thereby activating the complement cascade at multiple levels. This process leads to the release of potent inflammatory mediators such as C3a, which drive eosinophil infiltration, inflammation, and tissue damage. In mouse models, either genetic ablation or pharmacological inhibition of GZMK markedly reduced inflammatory cell infiltration, improved lung function, and ameliorated histopathological features, providing direct experimental evidence for its therapeutic potential. Collectively, these findings highlight GZMK as a key molecular target for controlling chronic rhinosinusitis recurrence.

Building on this rationale, our project aimed to identify and validate small-molecule inhibitors of GZMK activity. The workflow consisted of three main stages: first, constructing a recombinant human GZMK plasmid, followed by expression and purification in HEK293F cells to obtain high-purity, zymogen-form GZMK that could be activated in vitro; second, establishing a FRET-based assay using the fluorescent substrate DABCYL-GDGRSIMTE-EDANS, which successfully confirmed enzymatic activity and yielded kinetic parameters (Vmax = 437.5 RFU/min, Km = 50.20 μM); and third, conducting high-throughput screening with the L1000 compound library, from which three approved drugs were identified with inhibition rates exceeding 90%.

Among these, Nafamostat mesylate showed the strongest inhibitory effect, with an IC50 value of 0.1951 μM. This result not only validates the robustness of our detection and screening system but also provides a promising lead compound for the development of novel GZMK-targeted therapies for rhinosinusitis.

Experimental Workflow for Rhinosinusitis Treatment

Expression of GZMK

Introduction

Subsequent experiments, including inhibitor screening and binder–GZMK affinity assays, require substantial amounts of high-purity GZMK protein. As a human-derived serine protease, GZMK must be correctly folded to retain enzymatic activity, and its intracellular and extracellular toxicity poses significant challenges. Therefore, multiple expression strategies were explored to achieve functional recombinant GZMK.

Aim

To express and purify human GZMK in a manner that preserves its native folding and activity, ultimately yielding a stable, high-purity protein solution for downstream experiments.

Result

We initially attempted expression in E. coli, but difficulties in proper folding and significant differences from native human GZMK led us to switch to a mammalian expression system. A recombinant human GZMK construct was designed for secretion in HEK293F cells as an inactive zymogen, avoiding cytotoxicity, and subsequently activated in vitro via enterokinase cleavage.

The workflow included plasmid cloning in E. coli DH5α, transfection into HEK293F cells, 72-hour culture, collection of supernatant, Flag affinity purification, enterokinase cleavage, and final purification by size-exclusion chromatography (SEC). Both SEC profiles and SDS-PAGE confirmed successful isolation of high-purity GZMK suitable for downstream applications.

GZMK Purification Results

Discussion

GZMK expression serves as the cornerstone of the entire project. The process involved multiple rounds of trial and error, as detailed in the Engineering Cycle. We are pleased to report the successful expression, purification, and processing of GZMK, with verification of its expected enzymatic activity (described in the subsequent section), laying a robust foundation for downstream experiments.

Outlook

In the future, we plan to optimize the expression system further by testing alternative signal peptides, improved plasmid vectors, and expression timing. These improvements are expected to yield higher quantities and greater consistency of GZMK for large-scale applications.

GZMK Activity Assay

Introduction

To confirm that the expressed and purified GZMK retains enzymatic activity, and to establish a methodological foundation for inhibitor screening, we developed an in vitro assay system. A robust assay not only validates protein activity but also allows determination of kinetic parameters, providing technical support for high-throughput applications.

Aim

To validate and optimize existing in vitro enzyme activity assay methods from the literature and to obtain GZMK enzyme kinetics data.

Result

We initially employed a spectrophotometric thioester substrate (Z-Lys-SBZL), but low sensitivity and interference from buffer components and enterokinase prevented accurate results. We therefore switched to a FRET-based assay using the fluorescent substrate DABCYL-GDGRSIMTE-EDANS.

This optimized system successfully measured GZMK activity, confirming functional protein. Kinetic analysis using varying substrate concentrations yielded Vmax = 437.5 RFU/min and Km = 50.20 μM, demonstrating that we established a quantifiable and reproducible activity assay.

GZMK Activity Assay Results

Discussion

In developing the assay, spectrophotometric methods were limited by interference and low sensitivity. The switch to a FRET-based assay overcame these issues and provided robust results. However, constraints remained: the use of a 460 nm filter instead of the optimal 490 nm likely reduced sensitivity, and the lack of calibration between RFU and substrate concentration meant Vmax could only be expressed in RFU/min rather than conventional enzymatic units.

Outlook

In the future, we can further optimize the enzyme activity assay system by acquiring suitable filters to enhance experimental sensitivity and by establishing the correlation between RFU and substrate concentration for more precise enzyme kinetics characterization. These improvements will make our assay system more sensitive, accurate, and suitable for inhibitor screening applications.

Inhibitor Screening

Introduction

GZMK is a serine protease, and previous studies have demonstrated that inhibiting its activity can significantly alleviate airway inflammation such as rhinosinusitis. To identify potential inhibitors efficiently, we employed a high-throughput compound library screening approach, aiming to rapidly discover strong inhibitors of GZMK and assess their potential as lead compounds for drug development.

Aim

To screen for small molecules with strong inhibitory effects on GZMK activity and determine their IC50 values through concentration–response experiments, thereby providing candidate compounds for drug development.

Result

Before screening, we optimized substrate and enzyme concentrations to balance sensitivity and protein consumption. Using the L1000 compound library, which contains 1813 approved drugs, we conducted a high-throughput screen. Three compounds showed >90% inhibition, with Nafamostat mesylate standing out.

Results of Large-Scale Preliminary Screening

Subsequent concentration–response experiments with 18 gradient concentrations yielded an IC50 of 0.1951 μM for Nafamostat mesylate, confirming it as a potent inhibitor of GZMK with strong potential as a drug lead.

Nafamostat Mesylate Concentration–GZMK Activity Inhibition Rate Curve

Discussion

Through this high-throughput screen, we not only successfully identified Nafamostat mesylate as a potent GZMK inhibitor but also precisely determined its IC50 value. The significance of this achievement is twofold: it provides a highly promising lead compound for subsequent drug development, and importantly, the experimental validation of an active molecule offers a critical reference for exploring the binding mode of the GZMK-ligand complex. This insight will enable us to refine and calibrate our virtual screening models, paving the way for more efficient and accurate drug discovery in the future.

Outlook

Based on the successful experience of this screening, we have outlined a clear roadmap for the subsequent development of GZMK inhibitors, aiming to systematically advance the current findings toward the stage of preclinical candidates.

Expansion of Lead Compound Library & Mechanistic Studies

Guided by the latest virtual screening results, we will conduct a new round of screening against a larger and more diverse compound library, with the goal of identifying inhibitors with novel chemical scaffolds. This will enrich our candidate pool and reduce the risk of relying on a single structure type. In parallel, we will perform in-depth enzymological studies on Nafamostat mesylate to determine its precise mode of inhibition (e.g., competitive, non-competitive, etc.).

Lead Optimization & Preclinical Profiling

Once multiple scaffolds of lead compounds are obtained, we will initiate systematic lead optimization. Leveraging quantitative structure–activity relationship (QSAR) models and AI-assisted drug design, we will carry out rational modifications of existing molecules to generate derivatives with improved potency and selectivity. At this stage, essential preclinical profiling will be conducted in parallel, including:

Selectivity analysis: Evaluating inhibitory activity against other homologous serine proteases to ensure high specificity for GZMK.
ADMET prediction and assessment: Combining computational models with in-vitro assays to systematically evaluate absorption, distribution, metabolism, excretion, and toxicity, thereby filtering out molecules with poor drug-like properties at an early stage.

Validation in Cell and Animal Models

Optimized candidates with the best overall profiles will advance into biological validation. We will test their ability to inhibit GZMK downstream signaling pathways and alleviate inflammation in relevant cellular models of rhinosinusitis. Ultimately, we aim to demonstrate efficacy and safety in animal models, thereby providing robust experimental evidence to initiate true preclinical studies and achieving the complete transition from an initial “hit compound” to a “drug candidate.”

Software

Preface

In the following sections, we focus on presenting the core achievements obtained based on the computational protein design tools BetterMPNN and BetterEvoDiff. We concentrate on the key iterative process of method development and the performance of the models in dry-lab experiments, aiming to clearly demonstrate how we achieve efficient and precise de novo protein design through a strategy combining reinforcement learning with generative models. Detailed model architectures, training parameters, and workflows are documented in the project GitLab; while the optimization ideas and development processes are systematically summarized in the Software section. Through this arrangement, we hope to present the main findings in a structured manner while ensuring readers can trace back to complete methodological details and data support when needed.

Principle and Summary

De novo protein design represents a cutting-edge frontier in both synthetic biology and biomedicine. During our attempts to design GZMK-binding proteins, we found that classical approaches — such as backbone generation using RFdiffusion combined with sequence design via ProteinMPNN — achieved some success but still relied heavily on large-scale experimental screening, resulting in low efficiency and high costs. In recent years, reinforcement learning (RL) has demonstrated great potential in complex decision-making tasks, making it particularly suitable for modeling the iterative “generate–evaluate–optimize” loop in protein sequence design.

The goal of this project is to develop an intelligent design framework integrating reinforcement learning with protein generative models to achieve efficient, high-hit-rate de novo protein design. We constructed two tools: BetterMPNN and BetterEvoDiff. The former iteratively optimizes ProteinMPNN using the GRPO algorithm, enabling it to learn to generate high-affinity sequences on fixed backbones; the latter implements multi-site synergistic mutations based on EvoDiff for the directed evolution of existing proteins.

The experimental workflow includes three key phases: First, in the model training phase, we established a reward function based on AlphaFold metrics (ipTM, pTM, inter-chain PAE) and rapidly evaluated and screened a large number of candidate backbones through a parallelized training framework. Second, in the protein design phase, we obtained various functional proteins including active pocket inhibitors, multi-mode binding proteins, and cleavage sites. Finally, in the validation phase, we systematically evaluated the structural rationality and binding reliability of the designed results through dry-lab experiments and performed wet-lab validation on some sequences.

The results show that BetterMPNN can complete the entire design process from backbone to high-affinity inhibitors within 20 hours, achieving ipTM scores above 0.9 in dry-lab experiments. Simultaneously, we established criteria for rapidly assessing backbone potential early in training, significantly improving the resource efficiency of the design pipeline. Although BetterEvoDiff faces challenges in aligning dry/wet rewards during wet-lab validation, its optimization capability in dry-lab experiments still demonstrates the potential of the multi-site synergistic mutation strategy. These achievements not only validate the effectiveness of reinforcement learning in protein design but also provide a feasible technical path towards achieving one-shot design.

Introduction

Proteins are the executors of life functions, and enhancing our ability to design them has profound implications for biomedicine, enzyme engineering, and materials science. Traditional protein design methods primarily rely on physical force fields, structural simulations, and expert knowledge, resulting in cumbersome processes and limited success rates. With the development of deep learning technologies, particularly the breakthroughs of the AlphaFold series in structure prediction, new opportunities have emerged for protein design. Generative models like RFdiffusion can design protein backbones de novo, while ProteinMPNN can generate foldable amino acid sequences based on these backbones, together forming the current mainstream "backbone-first" design pipeline.

However, this pipeline has clear limitations: Firstly, the generation process lacks explicit functional guidance and heavily depends on subsequent high-throughput screening. Secondly, sequence design primarily considers foldability, not optimizing for functional indicators like binding affinity. Thirdly, the screening process is resource-intensive and time-consuming. These issues severely constrain the efficiency and scalability of protein design.

To overcome these challenges, we explored introducing reinforcement learning into the protein generation process. Reinforcement learning, through the "agent-environment" interaction mechanism, allows generative models to learn how to produce sequences that meet target properties iteratively. This project aims to develop protein design tools capable of autonomously optimizing binding interfaces and systematically evaluate their performance in tasks such as inhibitor design and binding protein optimization.

Aim

The core objective of this phase is to establish and validate two protein design frameworks—BetterMPNN and BetterEvoDiff—and, based on this, achieve the following specific tasks:

Develop a reinforcement learning-driven sequence generation pipeline: Use ProteinMPNN as the agent, GRPO as the optimization algorithm, and the environment providing rewards based on AlphaFold predictions and interface scoring to construct an “exploration–evaluation–optimization” loop.
Achieve rapid screening of high-potential backbones: Establish evaluation criteria for backbone potential using metrics like reward dynamics and sequence diversity early in training to enhance overall design efficiency.
Automatically select the optimal Hot-spot: Without pre-defining binding sites, the model can autonomously determine and converge to the optimal binding region during exploration.
Validate the one-shot generation capability of the model: Test the potential of the converged model to generate high-performance sequences in a single attempt, reducing the scale of experimental screening.

By achieving the above goals, we hope to demonstrate the effectiveness of the reinforcement learning framework in protein design and provide a solid foundation for subsequent wet-lab validation and tool iteration.

Two Novel Protein Design Tools: BetterMPNN & BetterEvoDiff

BetterMPNN

We proposed a novel design approach: treating the protein generation model as an intelligent agent within an "exploration-evaluation-optimization" loop, where reinforcement learning enabled it to progressively learn how to generate proteins with target properties, ultimately converging to stable, high-quality candidate solutions.

Based on this concept, we designed and developed BetterMPNN—a groundbreaking one-shot protein design tool. By integrating RFdiffusion, ProteinMPNN, and GRPO through an optimized iterative cycle, we continuously enhanced ProteinMPNN's generation capability, achieving rapid convergence from protein backbone structures to high-quality binders.

The Workflow of BetterMPNN

To validate the model's one-shot potential, we directly invoked the trained model for a single round of generation and obtained the following results. These demonstrate that the trained model exhibits one-shot capability potential, achieving high dry-lab scores for the generated sequences through just a single inference pass.

Results Generated by BetterMPNN at One Time after Training

BetterEvoDiff

We also developed BetterEvoDiff as an optimization tool for existing proteins. Based on the discrete autoregressive model EvoDiff-OADM and integrated with the GRPO algorithm, this tool enables multi-site cooperative mutations that systematically explore and optimize protein properties within the sequence space, making it suitable for binding affinity enhancement and functional evolution of proteins.

The Workflow of BetterEvoDiff

We directly called the trained model for one round of generation, and the reward values of the three obtained sequences were 0.9038, 0.8828, and 0.8645. This indicates that our model has acquired the one-shot capability after training, meaning it can generate sequences with excellent scores in a single attempt.

Rapid Screening of High-Potential Backbones

Through our designed workflow, the early-stage training curves of the model can indicate whether the initial backbone generated for a given task possesses potential for further optimization. The figure below displays the total reward trajectories from three representative tasks during training, revealing distinct differentiation as early as 30-40 steps.

Total Reward Trend for Different Potential Backbones

Through extensive testing, observation, and analysis, we have identified that "high-potential" backbones typically exhibit these characteristics:

Early emergence of high-scoring results during initial training phases, when the agent hasn't undergone substantial learning, indicating the backbone's inherent capacity to generate sequences with favorable structural predictions
Clear learning trends in early stages, where total reward progression follows the expected upward trajectory, demonstrating the agent's effective optimization toward generating higher-quality outcomes
Absence of premature convergence at low performance plateaus, suggesting the backbone possesses sufficient optimization headroom for continued improvement

By analyzing early-stage training curves, we can rapidly assess a backbone's potential for further optimization. This enables early exclusion of numerous low-potential backbones during initial training phases, thereby significantly enhancing training efficiency while conserving computational resources and costs.

Proteins Designed

Using BetterMPNN, we successfully designed a protein inhibitor capable of directly inserting into the active site pocket of granzyme K (GZMK) within 20 hours. This inhibitor forms a short peptide-like structure at its C-terminus, with its key residue (arginine) inserting into GZMK's catalytic pocket in a manner resembling natural substrates. Since its terminal residue is valine (Val), it theoretically avoids cleavage, thereby achieving stable binding. Dry lab evaluations demonstrated both ipTM and pTM scores exceeding 0.9, indicating its potential for high binding affinity.

Notably, although the model received no prior knowledge about GZMK's natural substrates or active site during training, it spontaneously converged to a structural pattern analogous to natural protease-inhibitor interactions. This computationally validates the significant potential of our function-oriented design approach.

Below we present the predicted structure and computational scores of a representative "inhibitor-type" protein obtained through our methodology.

Predicted Structure and Computational Scores of the "Inhibitor-type" Protein

ipTM: A scalar in the range 0-1 indicating predicted interface TM-score (confidence in the predicted interfaces) for all interfaces in the structure. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction.
pTM: A scalar in the range 0-1 indicating the predicted TM-score for the full structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure.
∆G_binding: The free energy change of multiple chains forming a complex, in the unit kcal/mol. The more negative ∆G_binding , the higher affinity between proteins.

During model training and one-shot potential validation, we also obtained numerous designed proteins with diverse binding modes.

Predicted Structure and Computational Scores of Proteins with Other Binding Modes

Note: For more specific data analysis and result presentation, please read the Software page.

Summary and Outlook

In summary, our developed tools, BetterMPNN and BetterEvoDiff, demonstrate the following advantages in de novo protein design tasks:

BetterMPNN enables near one-shot efficient design, widely producing high-affinity inhibitors and binding proteins.
Through dynamic assessment early in training, we can rapidly judge the optimization potential of backbones, significantly enhancing the resource efficiency of the overall design pipeline.
BetterEvoDiff demonstrates strong sequence optimization capability in dry-lab experiments, although its performance in wet-lab settings still requires improvement.

Looking ahead, we will continue to optimize the reward function to better align dry and wet-lab results and explore mechanisms for jointly optimizing both side chains and backbones by broadcasting reward signals to both. We believe that with continuous algorithmic iteration and the accumulation of validation data, this framework holds promise for achieving broader and more precise applications in the field of protein design.

Hardware

Overview

During the progression of our project, we observed that the success rate of currently available de novo designed proteins remains relatively low, requiring extensive expression and screening to obtain Binders with the desired functions. However, traditional prokaryotic or eukaryotic expression systems are time-consuming and labor-intensive. Furthermore, most candidate Binders exhibit low affinity, and large-scale reliance on Surface Plasmon Resonance (SPR) measurements is not only inefficient but also results in resource wastage. Based on this, we have developed the Nexus System — a high-throughput affinity screening platform — which aims to overcome the limitations of traditional protein affinity measurement methods in terms of cost, throughput, and physiological environment simulation. It enables large-scale preliminary affinity screening with moderate precision, followed by high-precision instrumental measurements for Binders with high affinity. This platform can not only be applied to our own project but also serve as a universal tool to provide a novel approach for binding protein affinity measurement.

This platform utilizes Fluorescence Cross-Correlation Spectroscopy (FCCS) to monitor intermolecular interactions in real time within the nanoliter-volume reaction units of a microfluidic chip. The technology directly measures complexes formed between two proteins, each labeled with a distinct fluorophore. A significant cross-correlation signal is generated only when these fluorescence-labeled proteins bind and simultaneously pass through the overlapping confocal volume of two lasers (488 nm and 561 nm). By analyzing the amplitude of this cross-correlation curve, we can directly quantify the concentration of the bound complexes, thereby enabling precise calculation of the binding dissociation constant (Kd).

To achieve the aforementioned functions, the entire platform primarily consists of four core components:

Microfluidic Chip Design and Control
FCCS Detection
3D-printed Chip Cartridge
Integrated Architecture

A comprehensive description of the platform can be found in the “Hardware” page.

Overall Design Entity of Protein Affinity Detection Platform

Microfluidic Chip Driving and Control Module

Microfluidic Chip Design and Control

Introduction

In traditional high-throughput protein affinity measurement processes, 96-well plates are commonly used for sample preparation and detection. However, this method fails to meet the project’s requirements in terms of sample consumption and automation level. Additionally, the open environment of 96-well plates makes samples prone to volatilization and contamination. To address these bottlenecks, our team ultimately selected microfluidic chips as the carrier for reactions and detection. After multiple iterations and optimizations, the microfluidic chips now possess core advantages including high throughput, low sample consumption, efficient mixing, high integration, and strong portability.

We optimized the primary chip in terms of optical performance and mixing efficiency under high-Reynolds-number differential flow conditions, enabling highly efficient sample mixing. In conjunction with the secondary chip, the system achieves precise and efficient sample injection. During the experiment, only a single sample loading is required for the samples to become fully mixed within the chip, after which FCCS (Fluorescence Cross-Correlation Spectroscopy) detection can be directly performed without any additional operation steps.

The control of the microfluidic system is crucial for realizing accurate sample manipulation and high-throughput analysis: it uses peristaltic pumps and syringe pumps to precisely control the flow, mixing, and reaction processes of liquids within the chip. Meanwhile, our team has designed standardized chip interfaces and fixing devices, which enable rapid assembly, disassembly and precise positioning of the chip, ensuring that the lasers can accurately focus on the detection window on the chip. Through multiple iterations and optimizations, the adaptability of the microfluidic system to measurement samples and its detection performance have been further improved.

Schematic diagram of the small core of our integrated architecture

Aim

The goal of this module is to ensure, through accurate manipulation and high-throughput analysis of protein samples, that the microfluidic chip can efficiently perform functions such as sample mixing, efficient sample injection, and automatic concentration gradient generation. Ultimately, it provides stable fluorescent protein samples for subsequent FCCS detection.

Result

We have completed the construction and verification of the microfluidic chip design and control module. At flow rates on the order of milliliters per minute, the flow field at the detection site is stable, and the focus can be aligned—meeting the stability requirements for FCCS/FCS detection. We have designed Y-shaped mixers (45 degrees on both sides), T-shaped mixers, and cross-shaped mixers: the first two are used to test the double-layer co-flow method, while the latter is used to test droplet-based and jetting-based mixing strategies. Two sets of Serpentine micromixers have been designed to ensure sufficient mixing, and a flow stabilizer of appropriate size has been installed downstream of the Serpentine micromixers to balance stable liquid flow and adequate diffusion.

Physical image of the primary chip

Experimental verification shows that the microfluidic chip can handle the perfusion and flow of various sample solutions, including ink and protein samples, without clogging. Simultaneously, the auxiliary chip generates concentration gradients through sample injection, while the main chip is used for mixing buffer and protein solutions—realizing the organic integration of sample injection and mixing functions.

The detection window on the main chip is used for fluorescence detection; the integration of FCCS has significantly optimized the detection platform’s ability to measure affinity in micro-volume systems.

This platform not only simplifies the experimental workflow and eliminates manual protein purification, but also, owing to its high sensitivity and temporal resolution, enables the acquisition of affinity data within an extremely short time.

Physical image of the secondary chip

Cell-free protein solution with yellow ink

Protein solution with blue ink

Cell-free protein solution with yellow ink and protein solution with blue ink mixing to green solution

Discussion

Regarding the problems encountered in the experiment, we hypothesize that the issues of weak excitation signals, insufficient photon flux density, and poor signal-to-noise ratio (SNR) stem from the soda-lime glass used: on one hand, the soda-lime glass absorbs a moderate amount of excitation photons; on the other hand, the relatively thick glass slide increases the optical path length, causing spherical aberration of high-angle light to be uncompensated by the objective lens design—resulting in severe insufficient photon flux density. At the same time, organic substances adhering to the surface also reduce light penetration. Therefore, we improved the material of the glass substrate in the optical path and attempted to replace it with a thinner material that has lower absorption values for light of specific wavelengths. After testing, the detected light intensity was significantly improved.

During the testing of the second-generation chip, we found that the jetting mode had clear limitations in its applicable flow-rate range, thereby limiting the concentration range that could be reliably generated and detected. To overcome this constraint, subsequent designs employed liquid pre-mixing, which reduced dependence on precise flow-rate control and expanded the measurable range.

Outlook

Looking ahead, we plan to implement automated gradient dilution on the secondary chip and connect it to the priamry chip, thereby integrating sample loading, dilution, mixing, detection, and output within a whole platform. Building upon this architecture, we will further conduct characterization of proteins, ultimately developing a scalable high-throughput protein interaction detection platform.

FCCS Detection

Introduction

There are various detection methods for measuring the affinity of binding proteins, such as Surface Plasmon Resonance (SPR), Enzyme-Linked Immunosorbent Assay (ELISA), and Bio-Layer Interferometry (BLI). However, these methods either suffer from excessively high costs, insufficient accuracy under low-concentration conditions, or excessive demands on the amount of protein samples, making them unsuitable for our project. After comparison, we ultimately selected Fluorescence Cross-Correlation Spectroscopy (FCCS) as the method for affinity determination.

Aim

The aim of FCCS Detection is to achieve efficient fluorescence signal detection and data acquisition based on the existing fluorescence correlation spectroscopy analyzer, thereby obtaining reliable binding affinity constant (Kd).

Result

We first measured the 488 nm standard sample for routine calibration of the instrument parameters, followed by the measurement of SUMO-mGFP, which was expressed using a cell-free expression system and excited with 488 nm laser light. The recorded data confirmed that the fluorescence spectroscopy analyzer is capable of performing autocorrelation analysis on our fluorescent protein samples.

FCS results of 488 standard sample

FCS results of Sumo-1 Protein

To test the analysis of cross-correlation data using the fluorescence correlation spectroscopy analyzer, we mixed the 488 nm standard sample and the 561 nm standard sample, placed a small volume of the mixture on a cover slip, and positioned it on the water-immersion objective of the FCCS detection module. The detection window was then aligned with the dual-color confocal focus, and both detector channels simultaneously recorded the temporal fluorescence fluctuations. The detection duration and switching interval were set for the spectrometer at each concentration point. By analyzing the cross-correlation curves obtained at different target protein concentrations, we determined the relative fraction of dual-color fluorescent complexes—directly reflected by the amplitude of the cross-correlation curve. As the target protein concentration increased, more dual-labeled proteins formed complexes, resulting in higher cross-correlation amplitudes. Fitting the cross-correlation amplitudes at different concentrations using a standard binding equilibrium model yielded a binding curve, from which the dissociation constant (Kd), a key indicator of binding affinity, was calculated. The result demonstrated that FCCS method sensitively reflects the binding interactions between proteins.

FCCS results of 488/561 standard sample

Analyzed FCCS results of 488/561 standard sample

Unfortunately, when we mixed SUMO-mGFP and Ubc-mRuby2 within the microfluidic chip and excited the fluorescent molecules simultaneously with 488 nm and 561 nm lasers, we were unable to obtain satisfactory cross-correlation curves and binding data.

Discussion

Through our investigation, we found that one of the major factors affecting the measurement results was signal crosstalk between the 488 nm and 561 nm excitation channels.

To address this issue, we applied a correction strategy: by pre-measuring samples containing only a single fluorescent label, we calculated the proportion of signal leakage from one channel to the other. During the subsequent analysis of dual-labeled samples, we used a crosstalk correction formula to subtract this leakage signal from the raw data, thereby ensuring the authenticity and accuracy of the cross-correlation signal.

In addition, other potential factors may also have influenced measurement accuracy, for example, insufficient purity or quality of the Ubc-mRuby2 protein.

Outlook

Looking ahead, we plan to implement systematic optimization in two key areas — microfluidic chip structure and fluorescent protein selection — to significantly increase the effective photon count without increasing molecular occupancy, and to more thoroughly eliminate dual-channel crosstalk.

Specifically, in terms of microfluidic chip design, we will further optimize the geometric dimensions of the detection region to achieve better matching between the spectrometer’s detection volume and the chip’s channel size; in terms of fluorescent protein selection, we will increase the difference in excitation wavelengths between the two fluorescent proteins, thereby minimizing crosstalk between the two optical signals and fundamentally improving data accuracy and reliability.

3D-Printed Chip Cartridge

Introduction

To ensure the stability, portability, and operational convenience of the system, we integrated and fixed all microfluidic components in a customized 3D-printed chip cartridge. The interior of the device features modular partitioning for the main and auxiliary chips, with necessary shock absorption and light shielding measures implemented to eliminate environmental interference with high-precision optical measurements.

Aim

Our design requirement was to create a housing for one main chip and one auxiliary chip to achieve unified packaging. The main chip has dimensions of 75×25mm, while the PDMS layer of the auxiliary chip measures 25×30mm with a thickness of 3-5mm; the channel depth and width are 50×100μm, the substrate glass slide has specifications of 75×25×1mm, and the drilled hole size is 1.4mm. Since two different auxiliary chips are used to inject multiple proteins, the design must enable easy replacement of the auxiliary chip during sample input. Mechanically, the device needs to provide reliable clamping fixtures that do not compress the PDMS chip, along with basic leak-proof and visual design features.

Design of 3D Printed Chip Cartridge

Result

We constructed a chip cartridge comprising a PLA material housing, a plexiglass cover, and M4 fasteners. A trapezoidal platform was designed inside the cartridge, with the auxiliary chip placed diagonally above the main chip to prevent liquid stagnation in thin tubes and avoid clogging or leakage from the chip holes in the auxiliary chip channels. This integration combines the sample injection function of the auxiliary chip with the protein mixing function of the main chip. In experiments, thin tubes were connected to the top holes of the auxiliary chip, and their end holes were linked to the main chip's sample inlet via thin tubes. Proteins are mixed in the main chip and output through the main chip's sample outlet, enabling high-throughput injection, mixing, and output.

Tests confirmed that the chip cartridge passed the static pressure leakage test: by injecting clean water via a peristaltic pump, no leakage was observed at any interfaces or housing openings. Additionally, the cartridge passed optical tests, demonstrating that the viewing window enables clear and distinct observation.

Furthermore, the chip cartridge was connected to a peristaltic pump and a syringe pump via hoses, forming a multi-functional hardware integrated device. The peristaltic pump is used to deliver buffer solutions, and the syringe pump for protein solutions. We selected the laboratory micro-syringe pump model YHPLC0100S, which includes three key components: the syringe pump structure, power adapter, and controller. The syringe pump utilizes a stepper motor and lead screw drive system to convert motor rotation into linear pushing motion, operating according to set speed and timing to achieve uniform and quantitative output of liquids or gases. We also chose the Baoding Longer BT100-2J peristaltic pump, which provides a flow range of 0.002–380 ml/min (single tube) and can be equipped with various pump heads. These two devices are connected to different holes of the auxiliary chip inside the cartridge via hoses for sample input; the main chip receives and mixes the samples, and the end hose outputs the mixture outside the cartridge. Verification confirmed that this hardware integrated device realizes the functions of input, mixing, and output of sample proteins.

Design Blueprint of the Third Version of 3D Printed Chip Cartridge

Physical Image of the Third Version of 3D Printed Chip Cartridge

Application scenarios of the Third Version of 3D Printed Chip Cartridge

Discussion

In response to changes in design requirements, we iterated to version 2.0, which includes two improvements over version 1.0: first, due to the iteration of the main chip, its dimensions changed, and accordingly, the main chip slot was adjusted to adapt to this change; second, during experiments using the main and auxiliary chips, we found that when the auxiliary chip was placed flat, on one hand, the liquid input into the auxiliary chip could not flow smoothly into the channels and often leaked from another sample hole, and on the other hand, the thin tubes connecting the two chips were prone to liquid stagnation. Therefore, we designed a trapezoidal platform to adjust the positional relationship between the two chips, placing the auxiliary chip on a platform diagonally above the main chip to prevent liquid stagnation in the tubes; meanwhile, tilting the auxiliary chip ensured that the input liquid flows smoothly through the tubes due to gravity.

After version 2.0, we conducted software simulation tests and proposed version 3.0 based on identified issues, with two further improvements: first, since the auxiliary chip could not be well fixed due to its tilted placement and mismatched slot dimensions, a clamping component was added; second, the solid trapezoidal platform consumed excessive 3D printing material, so the lower part of the platform was hollowed out and support components were added to minimize material consumption while ensuring mechanical stability.

Outlook

Future optimization directions mainly involve three aspects: in terms of material upgrading, if the device needs to resist organic solvents or have stronger cleaning capabilities, the housing material can be replaced with SLS material; regarding hose specification optimization, if plugging and unplugging resistance is excessive, hoses with an outer diameter of 1.3mm or 1.5mm can be selected and matched with different drill bit diameters, or a small amount of silicone-free lubricant can be applied to the PDMS hole openings before inserting the hoses; in terms of size and tolerance adjustment, the device shape should be fine-tuned based on the assembly experience of the first print to ensure that the PLA material is not warped under pressure.

Integrated Architecture

Introduction

Our integrated architecture takes the microfluidic chip as the core component, with pipetting pumps serving as the medium for transporting liquids to be reacted. Inside the microfluidic chip, a fluorescence detection device enables the measurement of protein affinity, and all devices are integrated within a 3D-printed housing. We have also developed a supporting detection platform on the host computer, which allows for the control of the entire device and the output of result curves.

The overall structure of the integrated architecture

Aim

Designed with Fluorescence Cross-Correlation Spectroscopy (FCCS) as the core detection principle, this integrated architecture aims to integrate fluid driving, high-throughput detection, and software control functions, constructing an integrated system with hardware-software synergy. It addresses the issues of low manual operation efficiency, high sample consumption, and poor data reproducibility in traditional protein affinity detection. Specific objectives include:

Balancing detection accuracy and cost through the differentiated selection of fluid driving equipment (syringe pumps for protein solution delivery, peristaltic pumps for buffer delivery)
Achieving high-throughput screening capabilities via the collaborative architecture of the main and auxiliary chips, which enables "single sample loading for multi-concentration gradient generation + sequential automatic detection"
Developing supporting control software to realize the integration of device control, experimental process management, and data analysis. It also adapts to scenarios such as precise laboratory detection and on-site demonstration, reducing the operation threshold while ensuring data reliability.

Result

In terms of the fluid driving system, we finally confirmed the division of labor scheme: "YHPLC0100S micro-syringe pump + Longer BT100-2J peristaltic pump". The syringe pump, with a flow rate range of 0.01–98 mL/min and a stepper motor precision of 0.0003125 mm, achieves low shear force and pulse-free delivery of protein solutions. The peristaltic pump, featuring a flow rate range of 0.002–380 mL/min and multi-pump head compatibility, meets the requirements for low-cost and high-throughput buffer delivery. The two work in synergy to ensure the accuracy and efficiency of fluid driving.

For high-throughput detection technology, based on the collaborative architecture of the main and auxiliary chips, the auxiliary chip’s dendritic dilution network enables automatic generation of multi-gradient concentrations with a single sample loading. Combined with the sequential measurement protocol, the software can automatically trigger signal acquisition and data calculation. Compared with manual control, the introduction of automation in our system significantly improves experimental efficiency and reduces sample consumption.

The software system is developed based on the Streamlit framework, implementing global state management through the session_state mechanism. It integrates device control, experimental process automation, real-time data analysis, and visualization functions, with built-in emergency stop and state reset modules. Non-professionals can quickly master the operation.

Tools: https://github.com/Quanyuyuyu/iGEM2025-Hardware.git

Discussion

The current integrated architecture still faces several challenges:

The single-channel design of the syringe pump limits multiple sets of parallel experiments;
The hoses of the peristaltic pump are prone to wear after long-term use, affecting stability;
The software only supports fixed experimental processes and lacks custom protocols and complex data mining functions;
Detection of low-concentration samples is easily interfered by background signals, requiring further algorithm optimization to improve the signal-to-noise ratio (SNR).

Outlook

Future efforts will advance the optimization of the architecture from three aspects: hardware upgrading, software expansion, and scenario extension.

Hardware end: Plan to develop multi-channel integrated syringe pumps to reduce equipment footprint; explore intelligent peristaltic pumps with hose wear monitoring; introduce micro-flow sensors to build a "pump-sensor-software" closed-loop control system, aiming to minimize flow rate errors to the greatest extent.
Software end: Intend to develop a custom experimental protocol module; integrate machine learning algorithms to realize automatic parameter optimization and abnormal data identification; add cloud-based data management functions to support collaboration and data security.
Application scenarios: Adapt to antibody-target screening in the biopharmaceutical field, rapid sample analysis in clinical diagnosis, and technology demonstration needs in scientific research and teaching. This will promote the development of the architecture toward multi-functionality and high adaptability.