Main
Affinity Measurement
Introduction
To identify candidate proteins suitable for colloidal gold test strips, we performed affinity measurements of GZMK binders after completing de novo design and purification. By comparing the binding strengths of different candidates, we were able not only to select the most promising proteins but also to feed the measured data back into the design cycle for further optimization.
Commonly used detection techniques include SPR, BLI, MST, ITC, and FPIA. Considering that our experiments required rapid screening across a large number of samples with limited protein quantities, we ultimately chose Surface Plasmon Resonance (SPR). SPR offers several advantages—high throughput, label-free detection, and low operating cost—making it particularly suitable for preliminary screening and quantitative analysis when handling a large pool of candidate proteins.
The principle of SPR is to monitor molecular interactions in real time through optical reflection. When polarized light strikes a metal-coated sensor chip at a specific angle, surface plasmon waves are excited. If the immobilized ligand (such as GZMK) binds to a candidate protein in solution, the local mass increases and alters the refractive index, leading to a shift in the resonance angle and a corresponding change in reflected light intensity. The instrument detects these shifts and records the binding and dissociation processes in a label-free manner, providing association rate constants, dissociation rate constants, and equilibrium dissociation constants (Kd), which together quantify binding affinity and kinetics.
In our experiments, we used the Biacore T200 platform for detection. Purified binders were first uniformly diluted and tested in batches to identify candidates with significant binding signals to GZMK. Subsequently, these candidates underwent multi-concentration gradient measurements, and kinetic curve fitting was performed to obtain precise affinity values. These results provided a critical basis for the final binder selection and the construction of the colloidal gold test strip.
Protocol
-
SPR Reagent Preparation
1.1 Stationary Phase
The GZMK protein expressed in eukaryotes was coupled onto the CM5 chip. After the ligand pre-enrichment experiment, we selected pH 5.0 as the pH for protein coupling. During sample loading, the GZMK protein was mixed with a sodium acetate solution at pH 5.0.
1.2 Mobile Phase
The fusion protein purified twice through a nickel gravity column and SEC was used as the mobile phase. During the initial screening, the concentration of the fusion protein was measured using Nanodrop and then uniformly diluted with SPR running buffer to 10 μM; proteins with concentrations below 10 μM were loaded at their original concentrations. In the concentration gradient screening process, the fusion proteins with affinity were sequentially diluted with SPR running buffer to 20 μM, 10 μM, 2.5 μM, 1.25 μM, 0.625 μM, 0.312 μM,and 0.156 μM for loading.
1.3 Buffer SPR Running Buffer
After multiple attempts, we ultimately determined the following formula to minimize nonspecific protein binding:
1XPBS
3mM EDTA
0.05% Tween-20
363mmolNaCl
-
Operation of the Biacore T200 Instrument
Biacore_T200 Instruction.pdf
Results
We conducted two rounds of preliminary screening to evaluate whether the purified binders exhibited measurable affinity. In the first round, a total of 27 proteins were tested (1-1, 1-3, 1-4, 1-6, 1-7, 1-9, 1-10, 1-11, 1-12, 1-13, 1-14, 1-15, 1-16, 1-17, 1-18, 1-19, 1-20, 1-22, 1-23, 1-24, 1-25, 1-26, 1-27, 1-28, 1-29, 1-30, 1-31). Among these, proteins 1-6 and 1-24 showed detectable affinity. These two proteins were subsequently diluted to 20 μM, 10 μM, 2.5 μM, 1.25 μM, 0.625 μM, 0.312μM, and 0.156 μM for multi-gradient measurements to improve precision. To minimize interference from nonspecific binding, the composition ratio of the SPR buffer was adjusted, and both experimental conditions and detection parameters were optimized, thereby enhancing data reliability. By fitting the appropriate concentration–response curves, we determined that protein 1-6 had a Kd of 6.863 μM, while protein 1-24 had a Kd of 0.4033 μM.
The second round of screening focused on proteins after His-tag removal, with 13 samples tested (1-6, 2-1, 2-2, 2-3, 2-5, 2-6, 2-8, 3-1, 3-2, 3-3, 3-6, along with controls 1-24 and Aprotinin). The results showed that proteins 1-6, 1-24, 2-1, 2-5, 2-6, 2-8, 3-2, 3-6, as well as Aprotinin, all displayed binding activity. From these, we selected 1-6, 2-6, 3-2, and Aprotinin for further concentration-gradient measurements. Throughout the experiments, parameter settings and operating conditions were continuously optimized to reduce background noise and obtain more accurate data. The final quantitative results were: protein 1-6 with a Kd of 23.86 μM, protein 2-6 with a Kd of 14.76 μM, protein 3-2 with a Kd of 99.61 μM, and Aprotinin with a Kd of 4.755 μM.
SPR binding results of Binder 1-6 and 1-24
SPR binding results of Aprotinin
GZMK Colloidal Gold Test Strip for GZMK Detection
Introduction
Chronic rhinosinusitis is a common respiratory disease with a notoriously high recurrence rate. Current clinical diagnosis mainly relies on symptom scoring and imaging, but there is still a lack of simple, sensitive, and visual molecular detection tools. Granzyme K (GZMK), released during nasal inflammatory responses, has been identified as a key pathogenic factor with strong diagnostic potential. However, no rapid GZMK detection methods are currently available, which greatly limits its translational value in clinical practice.
To address this gap, we designed a visual detection platform based on colloidal gold lateral flow strips. Our system creatively replaces traditional antibodies with de novo designed binding proteins (Binders). The colloidal gold–Binder conjugates specifically capture GZMK in the sample, migrate through the strip via capillary action, and are trapped at the test line by another immobilized Binder—forming a visible red band. A SUMO1–Ubc9 interaction system ensures the validity of the control line. This design not only reduces dependency on antibodies, but also tightly integrates protein design with downstream application, creating a closed-loop workflow from in silico binder design to real-world detection.
Protocol
Optimization of Conjugation Conditions
1.1 Determination of Optimal pH
- Add 1 mL colloidal gold solution to each of 5 EP tubes (1.5 mL).
- Adjust pH stepwise (6, 7, 8, 9) with 1 M Na₂CO₃, leaving one tube as control. Verify with precision pH paper (±0.5).
- Add >50 μg protein to each tube, vortex 10 s, invert to mix, incubate 15 min at room temperature.
- Add 100 μL 200 mM NaCl, stand for 1 h.
- Observe color change: blue/purple indicates aggregation; stable red without precipitation identifies the lowest optimal pH.
1.2 Determination of Optimal Protein Amount
- Repeat the above using optimal pH conditions.
- Add 5, 15, 25, 35, 45 μg protein to separate tubes.
- Incubate and salt-challenge as above.
- Identify minimal stable protein amount (red color maintained), then add 10% as safety margin—final optimal conjugation amount.
Protein–Gold Conjugation
2.1 Conjugation Reaction
- Place 20 mL colloidal gold in a 50 mL flask, stir at 200 rpm.
- Adjust pH slowly with 1 M Na₂CO₃ to optimal value.
- Add protein at optimal amount, stir 30 min at room temp.
- Add 2 mL 10% BSA, stir 30 min to block nonspecific sites.
- Store at 4℃ overnight.
2.2 Purification
- Centrifuge at 12000 rpm, 20 min; discard supernatant.
- Resuspend pellet in 20 mL 2% BSA, centrifuge again.
- Repeat wash, then stabilize with 1% BSA + 0.02 M TBS (pH 8.2).
- Final resuspension in 2 mL Au Buffer, store at 4℃.
Test Strip Assembly
3.1 Assembly
- Spot 4 μL gold–protein conjugate onto the gold pad, air-dry 30 min.
- Spot 1 μL Aprotinin (test line) and 1 μL Ubc9 (control line) onto NC membrane, dry 1 h.
- Assemble sample pad–gold pad–NC membrane–absorbent pad on backing card.
- Dry thoroughly at room temperature.
3.2 Validation
- Apply 30 μL PBS or sample.
- Migration complete in 5–10 min.
- Positive sample: red bands at both test and control lines.
- Negative sample: only control line visible.
- No control line = invalid strip.
Results
After assembling the strips, we verified their performance. In the negative control (PBS), only the control line appeared red, confirming no nonspecific binding. In the experimental group (GZMK solution), clear red bands appeared at both test and control lines—demonstrating successful GZMK detection. Importantly, the whole process—from loading the sample to reading the result—took just about 5 minutes, with no need for additional instruments.
This work marks the first demonstration of a double-Binder system (Binder1–GZMK–Binder2) replacing antibodies in a lateral flow assay, thus closing the loop from computational protein design to real-world diagnostic application. Although manual spotting caused some band irregularities, the results were stable, specific, and reproducible, highlighting the potential of designed binders for point-of-care molecular diagnostics.
Through this project, we contributed a novel measurement method for GZMK detection and expanded the toolbox of synthetic biology for early disease diagnostics.
Colloidal Gold Test Strip Results
GZMK Enzymatic Activity Assay
Introduction
To evaluate the activity of our expressed and purified GZMK protein, and to validate the feasibility of a subsequent high-throughput screening for GZMK inhibitors, we developed an in vitro activity assay system by integrating information from existing literature with practical laboratory conditions.
As GZMK is a serine protease with a specific recognition site, we employed two substrates reported to be catalytically cleaved by GZMK to characterize its catalytic rate. These two substrates were Z-Lys-SBZL and DABCYL-GDGRSIMTE-EDANS.
When Z-Lys-SBZL is recognized and hydrolytically cleaved by GZMK, a free thiol group is released. This thiol group reacts with DTNB (5,5'-Dithiobis/2-nitrobenzoic acid) to produce TNB⁻ (2-nitro-5-thiobenzoate), a product with maximum absorbance at 412 nm. Therefore, the cleavage efficiency of GZMK can be assessed by measuring the rate of absorbance increase at 412 nm using a plate reader.
However, during our experiments, we discovered that Enterokinase (EK)—which is used to cleave the GZMK propeptide during purification—also cleaves the Z-Lys-SBZL substrate, leading to an increased absorbance reading. The non-specific cleavage by EK, which is difficult to quantify in the final protein solution, severely compromised the precision of our measurements. Consequently, we shifted to a fluorogenic peptide substrate with better specificity and higher precision.
DABCYL-GDGRSIMTE-EDANS is a fluorogenic peptide. Its peptide sequence serves as a specific cleavage site for GZMK and is flanked by a fluorophore (EDANS) and a quencher (DABCYL). When GZMK cleaves this peptide, the two are separated, abolishing the quenching effect. The fluorophore then emits light at 490 nm upon excitation at a 340 nm wavelength, with the fluorescence intensity being directly proportional to enzyme activity.
Protocol
Fluorogenic Peptide-Based Enzyme Activity Assay
- Reagent Preparation
- Prepare a 10 mM stock solution of DABCYL-GDGRSIMTE-EDANS by dissolving it in DMSO.
- The enzymatic reaction is performed in a 1× buffer containing: 50 mM Tris-HCl, 150 mM NaCl, 0.01% Triton X-100, pH 7.6. To facilitate this, prepare a 5× stock buffer: 250 mM Tris-HCl, 750 mM NaCl, 0.05% Triton X-100, pH 7.6.
- Reaction System Setup
- The total reaction volume is 50 μL.
- The Experimental Group contains:
- 1.24 μL of GZMK protein solution (0.108 mg/mL) (final conc: 0.1 nM)
- 10 μL of 5× stock buffer
- 2.5 μL of fluorogenic peptide stock solution (10 mM) (final conc: 0.5 mM)
- 36.26 μL of deionized water
- The Negative Control Group: Replace the 1.24 μL of GZMK protein solution with 1.24 μL of PBS.
- Measurement of Relative Fluorescence
- Initiate the reaction by adding the GZMK protein solution (or PBS for the control) to the reaction mix. Mix well and immediately begin the measurement.
- Set the excitation wavelength to 340 nm and the emission wavelength to 460 nm (Note: This wavelength is based on our available instrument filter specifications).
- Record the fluorescence reading every 20 seconds for a total duration of 30 minutes.
Results
The in vitro enzyme activity assay we established can effectively detect the activity of GZMK, especially the fluorogenic peptide-based method, where the Relative Fluorescence Units (RFU) of the experimental group were significantly higher than the control group. We have used this method to confirm the enzymatic activity of each batch of our expressed and purified GZMK, ensuring that all GZMK-based experiments in our project could be conducted effectively.
GZMK enzymatic activity assay results
GZMK Enzyme Kinetics Assay
Introduction
Before proceeding with inhibitor screening, we characterized the enzyme kinetics of our expressed and purified GZMK using the established activity assay.We treated GZMK as an enzyme that follows Michaelis-Menten kinetics. The experimental method involved measuring the initial reaction velocity (the rate of fluorescence increase at the start of the reaction) at a fixed GZMK concentration across a series of varying fluorogenic peptide substrate concentrations. Finally, the data was fitted to the Michaelis-Menten equation using GraphPad Prism software to calculate the Michaelis constant (Km) and maximum reaction velocity (Vmax).
Protocol
- Reagent Preparation
- This protocol uses the fluorogenic peptide-based enzyme activity assay; general reagents should be prepared as described previously.
- Serially dilute the 10 mM fluorogenic peptide stock solution (in DMSO) to prepare a range of intermediate stock solutions at the following concentrations: 5 mM, 2.5 mM, 1.2 mM, 0.6 mM, 0.3 mM, 0.15 mM, 0.075 mM, 0.0375 mM, and 0.01875 mM.
- Reaction System Setup
- The reaction system is assembled as described previously. Add 2.5 μL of the corresponding intermediate peptide stock solution to the appropriate wells to create a final substrate concentration gradient as follows: 500 μM, 250 μM, 125 μM, 60 μM, 30 μM, 15 μM, 7.5 μM, 3.75 μM, 1.875 μM, and 0.9375 μM.
- For each final substrate concentration, set up three parallel experimental groups (technical triplicates) and one negative control group.
- Measurement of Relative Fluorescence
- The procedure for measuring Relative Fluorescence Units (RFU) is the same as described previously.
- Data Processing
- Data Correction: For each substrate concentration, average the fluorescence values of the three replicates at every time point. Then, subtract the corresponding value from the negative control group at each time point. This step corrects for the background signal caused by the spontaneous cleavage of the unstable fluorogenic peptide.
- Calculate Initial Velocity: From these corrected data curves, determine the average rate of RFU change between 200 and 1200 seconds. This rate is taken as the initial reaction velocity (V₀) for that substrate concentration.
- Kinetic Fitting: Using GraphPad Prism software, plot the initial velocities (V₀) against their corresponding substrate concentrations and perform a non-linear fit to the Michaelis-Menten equation to determine the enzyme kinetic parameters (e.g., Km and Vmax) for GZMK.
Results
Through Michaelis-Menten fitting, we successfully obtained key enzyme kinetic parameters for GZMK: a Km value of 50.20 μM, indicating a relatively high affinity between GZMK and the fluorogenic substrate, and a Vmax of 437.5 RFU/min. These data provide a critical baseline for subsequent inhibitor screening experiments.
Kinetics fitting results
High-Throughput Inhibitor Screening
Introduction
After obtaining sufficient amounts of active and stable GZMK protein and establishing a reproducible activity assay, we moved forward to large-scale inhibitor screening. We chose the L1000 compound library, which contains 1813 FDA-, EMA-, and CFDA-approved drugs. These molecules already have proven safety profiles and clinical potential, making them ideal candidates for repurposing. The goal of this experiment was to quickly evaluate, in vitro, how these small molecules affect GZMK activity and to identify the strongest inhibitors as candidates for further testing.
For each compound, we set up a 50 μL reaction containing GZMK, a fluorogenic substrate, and the test molecule. Enzyme activity was continuously monitored over 15 minutes, generating kinetic curves. By comparing with negative controls, we calculated inhibition rates and selected the top-performing molecules for the next validation stage.
Protocol
This experiment aimed to evaluate the effects of small-molecule compounds on GZMK enzymatic activity through high-throughput screening. Six 384-well plates were used, with each reaction system containing 50 μL, composed of three components: 47.5 μL Protein Buffer, 2 μL substrate solution, and 0.5 μL small-molecule solution. In the experimental group, the final concentration of GZMK was 84.23 nM, and the final concentration of substrate was 100 μM. Experimental operations were carried out using the Bravo liquid handling system, including reagent aspiration, addition, and mixing. The workflow was: mix Protein Buffer with small-molecule solution and incubate for 90 seconds, then add the substrate solution and immediately start fluorescence detection. Fluorescence intensity was recorded once per minute for a total of 15 time points. All operations were performed under sterile conditions to ensure fresh reagents and calibrated instruments, thereby avoiding system errors.
- Reagent Preparation
- Protein Buffer (Pro)
Each 50 μL reaction contained 47.5 μL. The total requirement was about 110 mL; to ensure stability, 120 mL was prepared in three batches.
Composition (per 40 mL):
- 680 μL GZMK solution (calculated according to concentration)
- 8 mL 5× HEPES
- 40 μL 100 mg/mL BSA solution
- 40 μL 10% Triton X-100 solution
- 31.24 mL ddH₂O
Procedure: Mix 22 tubes of GZMK solution, centrifuge (12000 rpm, 4℃, 10 min), collect the supernatant, determine the concentration, and calculate the volume needed. In a 50 mL centrifuge tube, add each component in proportion, gently mix, and let stand for 5–10 min to avoid bubbles. Prepare three batches for a total of 120 mL, store at 4℃, and mix gently before use.
- Substrate Solution (Sub)
Each 50 μL reaction contained 2 μL. The substrate was prepared as a 2.5 mM DMSO solution, with a total requirement of 13.655 mL.
Procedure: Dissolve five tubes of substrate powder (10 mg each) in 1 mL DMSO per tube, vortex thoroughly to dissolve, then combine and adjust the volume to 13.655 mL. Mix, store protected from light, and aliquot into 384-well plates (30 μL per well) before testing.
- Small-Molecule Solution
Each 50 μL reaction contained 0.5 μL (final concentration 1 mM).
Procedure: Weigh the appropriate amount of small-molecule powder according to molecular weight, dissolve in DMSO, mix, aliquot into 384-well source plates, and store at –20℃. Bring to room temperature before use.
- Experimental Workflow
The experiment was carried out on six 384-well plates. The Bravo system was first calibrated, and reagents were loaded in sequence: Protein Buffer (supply reservoir), substrate solution (aliquoted in 384-well plates), and small-molecule solutions (source plates). For each well, 47.5 μL Protein Buffer and 0.5 μL small-molecule solution were added, mixed, and incubated for 90 seconds. Then 2 μL substrate solution was added, mixed immediately, and transferred quickly to a plate reader. Detection conditions were: excitation at 340 nm, emission at 460 nm, with fluorescence intensity recorded every minute for 15 minutes. Each plate contained blank control wells (no inhibitor) to calculate inhibition rates.
Results
For each well, 15 sets of fluorescence intensity–time data points were obtained, yielding a total of 2160 datasets, including 1813 experimental groups and 347 control groups. Fluorescence intensity–time curves were plotted for all data, and reaction slopes and R² values were calculated to ensure reliability. The slopes of the experimental groups were then compared with the controls and converted into inhibition rates.
After statistical analysis of the 1813 drug screening results, we first used computational methods to identify candidate molecules with inhibition rates greater than 50%. Further manual inspection of curve shapes and raw fluorescence values was performed to exclude false positives and compounds with fluorescence interference. Ultimately, three molecules were confirmed with inhibition rates exceeding 90%.
These results demonstrate that the high-throughput screening platform can efficiently and reliably identify potent inhibitors from a large drug library within a short time, laying a solid foundation for subsequent IC50 determination and mechanistic studies.
Results of Large-Scale Preliminary Screening
Determination of the IC₅₀ Value of the Inhibitor
Introduction
In the previous round of large-scale high-throughput screening, we successfully identified several compounds with potential inhibitory activity against GZMK. Among them, Nafamostat mesylate, which showed the highest inhibition rate, was selected for further investigation. To obtain more precise quantitative information beyond the macro-level results, we introduced IC50 (half maximal inhibitory concentration) measurement. IC50 represents the concentration of inhibitor required to reduce the target enzyme activity by 50%. It reflects drug potency and the dose–response relationship, serving as a crucial parameter for selecting potential lead compounds and predicting therapeutic concentration ranges.
During the experiment, we determined and plotted the dose–response curve of Nafamostat mesylate as an inhibitor. Specifically, starting from an initial inhibitor concentration of 200 μM, we performed 18 successive two-fold dilutions, generating a gradient of 18 concentrations with the lowest around 1.53 nM. GZMK enzymatic activity was measured under each concentration condition. By comparing the activity of inhibitor-treated groups with the control group, we accurately plotted the IC50 curve and calculated its value. The aim of this experiment was not only to validate the reliability of the high-throughput screening results but also to provide quantitative, comparable parameters essential for subsequent drug optimization and potential therapeutic application.
Protocol
This experiment aimed to determine the inhibitory effect of Nafamostat mesylate on GZMK through gradient dilution and calculate its IC50 value. A 384-well detection plate was used, with each well containing a 50 μL reaction system composed of Protein Buffer, substrate solution, and inhibitor solution. The design included 24 experimental groups, each with four replicates. Plate layout: the leftmost three columns were enzyme-free controls (GZMK replaced with PBS), the rightmost three columns were inhibitor-free controls (DMSO instead of inhibitor), and the middle 18 columns were the concentration gradient groups. The final inhibitor concentrations ranged from 200 μM, with successive two-fold dilutions down to approximately 1.53 nM.
- Reagent Preparation
- Protein Buffer (Pro)
Each 50 μL reaction contained 47.5 μL. A total of 16 mL Protein Buffer was prepared with the following components:
- 200 μL GZMK solution
- 3.2 mL 5× HEPES
- 16 μL 100 mg/mL BSA solution
- 16 μL 10% Triton X-100 solution
- 12.568 mL ddH₂O
At the same time, 16 mL Blank Buffer was prepared by replacing GZMK with PBS, with all other components identical. Blank Buffer was added to the leftmost three columns, while Protein Buffer was added to the other 21 columns.
- Substrate Solution (Sub)
Each 50 μL reaction contained 1.5 μL. The substrate molecular weight was 1464.62. Two tubes of substrate powder (20 mg total) were dissolved in 4.1398 mL DMSO to prepare a 3.3 mM DMSO solution. The solution was aliquoted into a 384-well white plate, 20 μL per well.
- Inhibitor Solution
Each 50 μL reaction contained 1 μL. A series of 18 two-fold dilution gradients was required, ranging from 200 μM down to 1.53 nM.
Procedure: Prepare at least 160 μL of 10 mM Nafamostat mesylate in DMSO (tube “1”). Prepare 17 tubes labeled “2” to “18,” each containing 80 μL DMSO. Transfer 80 μL from tube “1” to tube “2,” mix, then transfer 80 μL from tube “2” to tube “3,” and so on until tube “18.” During plate setup, columns 4–21 corresponded to tubes “1”–“18,” with 15 μL added per well. For the leftmost three and rightmost three columns, 15 μL DMSO was added per well as enzyme-free and inhibitor-free controls, respectively.
- Experimental Workflow
Reactions were performed in a 384-well plate, each well containing 50 μL. First, Protein Buffer and inhibitor solution were mixed (pipette up and down 5–10 times, 30–40 μL total) and incubated at room temperature for 90 seconds. Next, substrate solution was added, immediately mixed (5–10 pipetting cycles), and quickly transferred to the plate reader. Fluorescence detection parameters were set as: excitation at 340 nm, emission at 460 nm, with fluorescence intensity recorded once per minute for 15 cycles. Each experimental group had three replicates. Data were used to generate dose–response curves and calculate IC50 values.
Results
For each well, 15 fluorescence intensity–time points were collected, yielding a total of 96 complete datasets. These included 18 concentration-gradient experimental groups (each with 4 replicates), 12 enzyme-free control groups, and 12 inhibitor-free control groups. Fluorescence intensity–time curves were plotted for each dataset, and the slopes of the linear regions and corresponding R² values were calculated to reflect the reliability of the reaction rates. The mean slope of the enzyme-free controls was used as the baseline, which was subtracted from all experimental and inhibitor-free groups. Inhibition rates were then calculated by dividing the adjusted experimental slopes by those of the inhibitor-free controls.
Based on this analysis, inhibition rates from the four replicates of each experimental group were statistically processed, and an inhibitor concentration–inhibition rate table was constructed. Data were imported into GraphPad Prism, log-transformed, and fitted with a nonlinear regression model to generate a standard dose–response curve. The final results showed that Nafamostat mesylate exhibited an IC50 value of 0.1951 μM against GZMK. The fitted curve displayed a robust sigmoidal distribution with high consistency between data points and the model. This result not only validated the effectiveness of the high-throughput screening but also quantitatively revealed the inhibitory potency of Nafamostat mesylate with precision.
Nafamostat Mesylate Concentration–GZMK Activity Inhibition Rate Curve
Physics-guided PPI-Interaction-Affinity-Prediction Pipeline (PPI-APP)
Introduction
A major bottleneck in de novo binder design is the lack of high-throughput, rapid methods for measuring binding affinities. Many designed binders lack natural homologs, making conventional affinity-prediction approaches that rely on mutation or homology information inapplicable. Experimental affinity assays and high-accuracy molecular docking are costly and time-consuming. Moreover, purely data-driven models tend to perform poorly when test samples share little feature overlap with their training sets.
To address these challenges, we developed a computational pipeline that integrates AlphaFold3 (AF3), PyRosetta, and PBEE (Protein Binding Energy Estimator) to provide reasonably accurate affinity estimates within practical time and cost constraints. The pipeline works as follows: AF3 predicts three-dimensional structures from sequence and provides per-residue confidence metrics; PyRosetta then performs confidence-guided structural refinement (e.g., targeted FastRelax) to produce conformations that better satisfy physical restraints and more closely resemble plausible complex states; finally, PBEE extracts physics-based interface descriptors from the refined complexes and feeds them into machine-learning models to predict binding affinity.
In our experience, direct affinity estimation from PyRosetta energy scores is often biased by flexible regions and can underestimate binding strength. Combining physics-derived interface features with data-driven models (PBEE) improves both the stability and accuracy of predictions. Because most components are physics-based, the extracted features remain interpretable, which facilitates subsequent model refinement and experimental decision-making.
This workflow is intended as a practical screening tool for de novo binder projects—balancing accuracy and throughput to prioritize promising candidates for downstream experimental validation with reduced cost and effort.
Figure 1. Pipeline diagram
Protocol
We estimated the binding affinity of the GZMK–protein binder as follows:
- Hardware and tools
- Batch jobs were run on cloud-based HPC resources with both GPU- and CPU-intensive nodes. GPUs were used for fast generation of AlphaFold3 (AF3) outputs, while CPU-intensive nodes handled all downstream processing.
- Software: PyRosetta 4 (2024 release). The PBEE workflow used is from https://github.com/chavesejf/PBEE.git and the DockQ scripts are from https://github.com/wallnerlab/DockQ.git.
- AlphaFold3 settings
- AF3 was run in a no-MSA mode. For efficiency, we kept the target protein GZMK’s MSA information fixed while setting the designed binder’s MSA to empty; this substantially speeds up AF3 without a significant loss in accuracy for de novo designs, since most designed binders lack natural homologs.
- Each AF3 run produced five predicted models.
- File conversion
- For each AF3 output folder, each model was converted from .cif to .pdb using BioPython.
- Confidence extraction
- We used the IPSAE confidence-analysis scripts (https://github.com/DunbrackLab/IPSAE.git) to convert AF3’s large per-model JSON outputs (full_data_${model_id}.json) into concise per-residue metrics such as pLDDT and PAE.
- PyRosetta FastRelax protocol
- Inputs: the five .pdb model files from AF3 and the per-residue confidence summaries produced by IPSAE. The Relax job also accepts the input PDB path, an output file name, and the number of iterations as parameters.
- Data loading and preprocessing: structures are read with
pose_from_pdb()
. cleanATOM
is applied to remove non-essential waters and to add missing hydrogens as needed.
- Score function: we base refinement on
ref2015_cart
(alternatively beta_nov16_soft
can be explored). Hydrogen-bond weights for intra-chain backbone–sidechain (hbond_bb_sc
) and inter-chain sidechain–sidechain (hbond_sc
) interactions are upweighted.
- MoveMap setup: residues are classified by
classify_residues()
into categories:
- pLDDT: high / medium / low according to thresholds (e.g., >95, 80–95, 50–70, <25 percentiles)
- interface residues:
n0res
above the 75th percentile
- hydrophobic interface residues: interface ∩ {ALA, VAL, ILE, LEU, MET, PHE, TYR, TRP}
- hydrophilic interface residues
- terminal residues: first/last six residues of each chain
The classification function returns a dictionary {category_name: residue_set}
. setup_movemap()
then assigns degrees of freedom:
- Default: backbone and sidechain motions locked.
- Medium-confidence residues: enable sidechain (
chi
) movement.
- Low-confidence residues: enable backbone (
bb
) and sidechain (chi
) movement.
- Interface residues: force-enable sidechain movement.
- Constraint generation:
generate_constraints()
applies coordinate and dihedral restraints per category:
- High confidence (pLDDT ≥ 95): strong coordinate constraints (Harmonic, σ = 0.1).
- 80–95 range: dynamic coordinate constraints with σ scaled by pLDDT.
- Low confidence (<25%): flat-bottom coordinate constraints (allow small local flexibility, penalize larger deviations).
- Low confidence + secondary-structure regions: add dihedral (phi/psi) constraints to preserve α-helix or β-sheet geometry.
- Terminal residues not at the interface: moderate coordinate constraints to prevent termini from drifting.
- Preprocessing:
repack_all_residues()
repacks all side chains prior to Relax to reduce rotamer bias.
- Execution: apply the configured MoveMap and score function, attach constraints, and run FastRelax for n iterations to obtain the final refined models.
- PBEE affinity prediction
- PBEE is used to compute binding-affinity estimates for each AF3 model after PyRosetta relaxation. We also tune PBEE’s score-function settings as part of model optimization.
- Structural convergence checks with DockQ
- After relaxation, we use DockQ to compare the five models pairwise, computing RMSD, iRMSD, and fnat (the fraction of interactions present in model A that are also present in model B); fnat values closer to 1 indicate greater convergence among relaxed models.
- Evaluation
- Test set selection: to validate the pipeline we selected 13 protein–protein complexes from the PDBBind+ 2024 dataset that match our task criteria: (a) two-chain complexes only; (b) interfaces that are mostly rigid rather than composed entirely of flexible loops (to match expectations for de novo designs); (c) high-resolution structures with experimentally measured affinities; (d) AF3 predictions (with MSA) that have iPTM and pTM both ≥ 0.7; and (e) structural diversity across the test set to ensure robustness.
- Evaluation procedure: for each complex we run steps 1–7 above and compare the PBEE-predicted affinities for each model against experimental values; we also examine fnat and DockQ scores across the models after relaxation.
- Data aggregation: we compile prediction errors (affinity deviations), fnat, and DockQ scores and visualize them across different Relax protocols and iteration counts.
- Comparison and optimization: we compare results from different Relax settings to identify the most effective protocol and guide further optimization.
Results
After multiple iteration steps, the results are quite surprising.
-
We first calculated the affinity of the five raw AF3 models for each test protein using prodigy, the most frequently used PPI affinity prediction model. We then repeated the calculation on the adaptively relaxed structures in our own pipeline. The error rates are shown in the figure below. As the figure below illustrates, predictions from our workflow show a substantially higher correlation to experimental measurements than those from the baseline AF3-PRODIGY method. The Pearson correlation coefficient rose from 0.091 to 0.337 following our enhancements, and reached a final value of 0.415 with the fully implemented workflow.
Figure 2. Iterative Engineering of the Computational Pipeline Systematically Improves Binding Affinity Prediction.
The figure demonstrates the stepwise improvement of the binding affinity prediction workflow across three key stages, evaluated on a benchmark set of seven protein complexes. The top row shows scatter plots of predicted versus experimental binding free energy (ΔG), while the bottom row displays the distribution of predictions for each protein as box plots.
The baseline workflow, combining raw AlphaFold3 structures with the PRODIGY prediction model, exhibits a very weak correlation with experimental data (Pearson r = 0.091). The box plots show significant and inconsistent deviations from the true affinity values (red diamonds).
The second stage incorporated a global structural relaxation protocol (fastrelax). This step substantially improved the predictive power, increasing the correlation to r = 0.337. The prediction distributions are visibly more aligned with the experimental values compared to the baseline.
The final, fully-engineered pipeline utilized our novel confidence-guided adaptive relaxation strategy coupled with the PBEE prediction model. This workflow achieved the highest accuracy, with the Pearson correlation coefficient reaching r = 0.415. The box plots show that predictions are more tightly clustered and centered more closely on the experimental affinities, demonstrating the success of our iterative design-build-test-learn cycles.
In all scatter plots, each point represents a prediction from one of five structural models, and the dashed line indicates a perfect correlation (y=x). In all box plots, the central line indicates the median, and red diamonds mark the corresponding experimental affinity.
-
In investigation of relevance between structural convergence (higher similarity) among the relaxed models leads to accuracy or consistency affinity predictions, the data indicates a moderate positive correlation, suggesting that models which converge to a highly similar structure may paradoxically yield less accurate predictions, which suggest that there may be systematic errors in the prediction between validated structure to its affinity. So our next step should be focused on improving accurate prediction model on solid protein structures.
Figure 3. Convergence analysis
Analysis of the relationship between inter-model structural similarity and prediction performance for the seven protein benchmark set.
The lest scatter plot of the absolute prediction error (the difference between the mean predicted affinity and the experimental value) versus the average structural similarity score among the five relaxed models for each protein.
The right scatter plot of the standard deviation of the five affinity predictions versus the average structural similarity score.
-
Finally, we tested the entire pipeline on experimentally determined structures from our own lab, which were not used in any prior tuning of the relaxation protocol. The sample of 1-24 mentioned in “Obtaining Accurate Affinity Values” module shows a prediction average value of -8.465 kcal/mol for 1-6 (experimentally delta_G=7.28 kcal/mol); -10.01 kcal/mol for 1-24 (experimentally ΔG ≈ -8.71 kcal/mol), demonstrating clear correlation.
Lab validation results
References
[1]Lan F, Li J, Miao W, et al. GZMK-expressing CD8+ T cells promote recurrent airway inflammatory diseases. Nature. 2025;638(8050):490-498. doi:10.1038/s41586-024-08395-9
[2]Donado CA, Theisen E, Zhang F, et al. Granzyme K activates the entire complement cascade. Nature. 2025;641(8061):211-221. doi:10.1038/s41586-025-08713-9
[3]Hink-Schauer C, Estébanez-Perpiñá E, Wilharm E, et al. The 2.2-A crystal structure of human pro-granzyme K reveals a rigid zymogen with unusual features. J Biol Chem. 2002;277(52):50923-50933. doi:10.1074/jbc.M207962200
[4]Bouwman AC, van Daalen KR, Crnko S, Ten Broeke T, Bovenschen N. Intracellular and Extracellular Roles of Granzyme K. Front Immunol. 2021;12:677707. Published 2021 May 4. doi:10.3389/fimmu.2021.677707
[5]Bovenschen N, Quadir R, van den Berg AL, et al. Granzyme K displays highly restricted substrate specificity that only partially overlaps with granzyme A. J Biol Chem. 2009;284(6):3504-3512. doi:10.1074/jbc.M806716200
[6]Bouwman AC, van Daalen KR, Crnko S, Ten Broeke T, Bovenschen N. Intracellular and Extracellular Roles of Granzyme K. Front Immunol. 2021;12:677707. Published 2021 May 4. doi:10.3389/fimmu.2021.677707
[7]Guo CL, Wang CS, Wang ZC, et al. Granzyme K+CD8+ T cells interact with fibroblasts to promote neutrophilic inflammation in nasal polyps. Nat Commun. 2024;15(1):10413. Published 2024 Nov 29. doi:10.1038/s41467-024-54685-1
[8]Dotiwala F, Fellay I, Filgueira L, Martinvalet D, Lieberman J, Walch M. A High Yield and Cost-efficient Expression System of Human Granzymes in Mammalian Cells. J Vis Exp. 2015;(100):e52911. Published 2015 Jun 10. doi:10.3791/52911
[9]Wilharm E, Tschopp J, Jenne DE. Biological activities of granzyme K are conserved in the mouse and account for residual Z-Lys-SBzl activity in granzyme A-deficient mice. FEBS Letters. 1999 Oct;459(1):139-142. DOI: 10.1016/s0014-5793(99)01200-4. PMID: 10508933.
[10]Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13, 1265 (2022). https://doi.org/10.1038/s41467-022-28865-w
[11]Elton J. F. Chaves, João Sartori, Whendel M. Santos, Carlos H. B. Cruz, Emmanuel N. Mhrous, Manassés F. Nacimento-Filho, Matheus V. F. Ferraz, and Roberto D. Lins. Journal of Chemical Information and Modeling 2025 65 (5), 2602-2609. DOI: 10.1021/acs.jcim.4c01641