Software | YNNU-China

Introduction

Plants have evolved a wide variety of secondary metabolites, which are valuable resources for medicine, cosmetics, and the food industry (1,2). Commercial production still relies mainly on plant extraction, which suffers from low yields, resource limitations, and environmental pressures.Synthetic biology and metabolic engineering now offer sustainable alternatives by reconstructing plant biosynthetic pathways in microorganisms (3). However, many plant-derived enzymes show poor stability and low activity in heterologous hosts due to differences in folding environments, post-translational modifications, and cellular structures, limiting production efficiency (4). Advances in AI, computational biology, and protein design have created new opportunities for enzyme engineering. Directed evolution and rational design are key strategies, but each faces challenges—directed evolution requires high-throughput screening, while rational design depends on accurate structural and functional data. Recent breakthroughs in structure prediction (e.g., AlphaFold) and ligand docking, combined with co-evolutionary analysis, enable high-precision modeling and targeted mutagenesis (5,6). AI-based tools now allow simultaneous prediction of stability, activity, and functional effects of mutations, significantly accelerating design cycles.Despite progress, existing platforms (PROSS and FireProt, etc) focus mainly on stability improvement and cannot fully address the activity-stability trade-off, epistatic effects, or catalytic fine-tuning of plant enzymes (7,8). Therefore, a specialized, efficient, and user-friendly intelligent design platform is needed to enhance the heterologous expression, catalytic performance, and overall adaptability of plant enzymes for synthetic biology applications.

Description

Fig. 1 Artificial evolution strategy of natural enzymes driven by evolutionary data.

In this study, we developed an evolution-based enzyme redesign (REvoDesign) workflow to guide the rapid evolution of enzymes and improve the efficiency of plant enzyme engineering for natural product synthesis in microbial hosts (Fig 1) in limited research resource. We drafted the architecture by consisting of four functional levels to form a collaborative working system (Fig 2). 1) User Interaction Layer, which provides an intuitive and user-friendly graphical user interface (GUI), matching corresponding design protocols based on the design tasks and requirements specified by the user. 2) Core Configuration Layer, which uses Hydra as the centralized configuration engine, connecting the UI widgets to the configurations, implement the management and modification of core parameters. User-defined design protocols are loaded and flexibly adjusted in runtime interactions. 3) Task Execution Layer, which is in charge of decomposing design tasks into so-called mini-tasks, executing specific calculation processes via abstract tool interfaces, and finally ensuring the correct realization of design goals. 4) Interface Adaptation Layer, which serves as a lower level of API that modularly interfaces and third-party tools (such as PyMOL, etc.), playing an essential role of functionalities. This architecture ensures the development efficiency and flexibility of REvoDesign workflow in terms of function extension, task management, and visualization operations, providing strong technical support for protein design. Based on this framework, a visual operation interface is initially set up, and user operation modules such as Prepare, Mutate, Evaluate, Cluster, Visualize, and Interact are configured (Figs 3 and 4).

Fig. 2 Hierarchical Design Process of REvoDesign Software.

Fig. 3 Main program interface of REvoDesign and definition of its functional zones.

1.Protein model and protein chain selection; 2. Parallelization settings, with constraints on resource usage; 3. Color preset area, used for setting the color display settings for processing tasks; 4. Main functional area, decomposition of task steps into multiple sub-tabs; 5. Task progress bar, indicating the progress and status of task processing; 6. Tooltip bar, used to display the tooltip information when the mouse pointer is Hovering upon a specific widget.

Fig. 4 Interactive Function Menu of REvoDesign.

REvoDesign offers auxiliary design functions through a drop-down menu. The red cross symbol preceding the menu items indicates that the function has not yet been fully implemented/imported.

Demo

Use Case

Please see the results section in the Wet Lab for this project.

Software Installation and Deployment

Prerequisites

We defined REvoDesign software as a modern interface for REvoDesign workflow, as well as many advanced protein design tools.

REvoDesign software can run on the following operating systems:

Windows 10/11
macOS Monterey (>=12.0) or later; not Tahoe(26)
Ubuntu 20.04 or later

The requirements for hardware and software are as follows:

Requirements	Recommended configuration
Architecture	Win-64; Linux-64; OSX-64; OSX-arm
Hard Drive	Basic: 8 GB or higher
Memory	Basic: at least 2 GB; Recommended: 16 GB
Display	Resolution: 1920×1080 or higher
Software dependencies	PyMOL Insentive (2.5.7 and above; 3.0.2 and above), not recommended for Apple Silicon with REvoDesign Extra tools; or PyMOL open-source (2.5.0 and above; 3.1.0 and above); or Python (3.9-3.12) if installed everything manually

Installation

Installation of PyMOL

The PyMOL open-source version is strongly recommended according to our experiences on test runners.

Go to the Miniconda page of Anaconda, and follow the instructions on the official website to download and install Miniconda. It is recommended to install it using the command line method.
Create a new Conda environment: conda create -y -n REvoDesign python=3.11.
Activate the newly created Conda environment: conda activate REvoDesign
Install the open-source version of PyMOL: conda install -y -c conda-forge pymol-open-source
Start PyMOL from the commandline: pymol

The graphical package manager

REvoDesign offers a user-friendly graphical pacakge manager . The installation process is as follows:

Download and install the REvoDesign package manager
- Meet the REvoDesign and its package management tool
  - On the repository page of REvoDesign (GitLab) , one can obtain the link to the REvoDesign source code and the package manager according to the instructions.
- Install the REvoDesign package manager as a PyMOL Plugin
  - Open the installed PyMOL from the commandline.
  - Click the menu bar: Plugin → Plugin Manager.
  - In the popup window, click the Install New Plugin tab.
  - Under the Install from PyMOL Wiki or any URL section, paste this link to the REvoDesign Package Manager into the URL field and click Fetch on the right.
  - In the popup window, click Yes to confirm the installation.
  - In the new popup window, confirm the installation location for the plugin, the default location is fine. Click OK to proceed.
  - After installation is complete, the Plugin menu in PyMOL will automatically show the entry REvoDesign Package Manager.
Installing REvoDesign and Its Components via the REvoDesign Package Manager
- In PyMOL, open the Plugin menu and select REvoDesign Package Manager.
- In the Source section, choose Repository as the installation source.
- If needed, configure network settings in the Network section, such as using a network proxy or specifying a PyPI mirror.
- Based on your own needs, select whether to install the optional components in the 'Extra' section. 'None' indicates not installing any components; when 'Customized' is selected, the component list will be expanded on the right side, and you can check the components you want to install; 'Everything' means installing all components except the test suite. It is recommended not to install any components during the initial installation. After the installation is successful, you can individually check the components and install them again.
- Click Install. The installer will automatically download and install all required dependencies and the selected components.

Manual

Detailed explanation of software functional modules

In this tutorial, we will use a structure from the RCSB PDB (PDBID: 1SUO) as a quick example.

REvoDesign Interface

Fig. 5 Basic functional divisions of the REvoDesign main interface.

Main Interface of REvoDesign

The REvoDesign main interface is divided into the following functional areas:

Molecule and Chain Selection Area: Visible after loading a molecule.
Processor Core Usage Constraint Area: The maximum value corresponds to the number of available cores in the current system.
Color Preset Area: Used for result rendering and display. The adjacent checkbox allows inverting the preset definitions.
Core Functional Module Area: The main operational region, organized into tabs according to function and workflow.
Progress Bar: Indicates task status.
- During background computations, the progress bar bounces around to show a busy state.
- During design, it shows computation progress.
- During rational screening, it indicates the position where the current item locates within the overall (e.g., the position of a mutant within the full mutant tree, or a co-evolutionary residue pair within the full pair queue).
Tooltip Area: A quick tooltip hint will be shown when the mouse cursor hovers over a widget.

In addition to color configuration, the Color Preset Area also provides a preset display, visually presenting the color spectrum of each preset. The direction from the top-left to the bottom-right represents the transition of values from low to high. The default preset is bwr_r, which transitions from red → white → blue.

Fig. 6 The drop-down list and a fast preview icon of color presets.

In addition, we placed satalite design tool under the menu tree, so users can found it easily according to their functionalities

Fig. 7 The dropdown menu provides additional task functions.

For example, the "Relax w/ Ca Constraints" function under Tools → Rosetta performs iterative optimization of side chains while keeping the protein backbone fixed. The popup window collects user input parameters, such as the PDB file path, the number of structures to optimize per iteration, the number of iterations, the save path, etc.

Fig. 8 The task window pops up from the drop-down menu, providing a comprehensive guidance towards the corresponding calculation parameters.

As the input parameters in the task window pop are not part of the core configuration, they will not be saved when the window is closed. If users need to preform a repetative experiment, they should save the parameters after entering them, making it easier to reuse them in subsequent calculations.

Task parameters can be loaded by clicking Load or quickly loaded by dragging the parameter file into the window.

Fig. 9 The calculation parameter configuration in the task window can be saved as a JSON file and can also be loaded again by dragging and dropping.

In addition, REvoDesign supports international translation, but only for the main window. Other areas, such as notes in the log and task windows pop, are still in English.

Fig. 10 Interface language switching.

Fig. 11 Chinese user interface.

Any inputs or changes made in the main window can be saved as an experiment, facilitating reuse and repeated calculations.

Fig. 12 Configuration files can be loaded and saved as experiment configurations.

Fig. 13 Saving the configuration.

Any changes made in the user interface directly modify the configuration in memory, but are not saved to the hard drive. Only by clicking Save Configuration will the changes be saved and loaded the next time the program starts.

Initialize Configuration: Clears the user's current configuration files and copies a fresh, complete read-only set from the program directory.
Edit Configuration: Opens the configuration file in the user's browser for manual editing via the configured editor.
Direct modifications to the configuration file will not be reflected in the program window. To apply a modified configuration file, click Reload Configuration.
If the program fails to start due to incorrect configurations, the configuration can be reset via the right-click menu in the package manager.

Evolutionary data analysis

REvoDesign requires searching sequence databases to compute conservation information (PSSM) and co-evolution information (GREMLIN). Because these sequence databases are large, we separated this step and implemented it as a simple-yet-user-friendly web-based computation service. Laboratories with the proper resources can setup their own instance of the website or standalone running scripts by following the provided setup guide.

Suppose that the submission URL of this instance is: http://your-server-ip:8080/PSSM_GREMLIN/create_task

Note: The computation service does not enforce input sequence format. Users must provide sequences in strict FASTA format, with each file containing a single sequence. Unknown residues can be represented by X; stop codons (*) are not allowed. For example:

>1SUO_A
XXXXXXXXXXXXXXXXXXXXXXXXXXXGKLPPGPSPLPVLGNLLQMDRKGLLRSFLRLR
EKYGDVFTVYLGSRPVVVLCGTDAIREALVDQAEAFSGRGKIAVVDPIFQGYGVIFANG
ERWRALRRFSLATMRDFGMGKRSVEERIQEEARCLVEELRKSKGALLDNTLLFHSITSN
IICSIVFGKRFDYKDPVFLRLLDLFFQSFSLISSFSSQVFELFSGFLKYFPGTHRQIYR
NLQEINTFIGQSVEKHRATLDPSNPRDFIDVYLLRMEKDKSDPSSEFHHQNLILTVLSL
FFAGTETTSTTLRYGFLLMLKYPHVTERVQKEIEQVIGSHRPPALDDRAKMPYTDAVIH
EIQRLGDLIPFGVPHTVTKDTQFRGYVIPKNTEVFPVLSSALHDPRYFETPNTFNPGHF
LDANGALKRNEGFMPFSLGKRICLGEGIARTELFLFFTTILQNFSIASPVPPEDIDLTP
RESGVGNVPPSYQIRFLARH

Fig. 14 Submitting the sequence for calculation.

On the submission page, select and upload the sequence file. After the upload is complete, the system will return a MD5 sum identifier and the calculation status. A status of Still running indicates that the sequence has entered the computation queue.

Users can then go to the dashboard page and refresh to monitor changes in the computation status.

Fig. 15 The sequence calculation status is calculating.

When the user finds that the upload sequence is incorrect, it is possible to cancel the queued task or terminate the task that has already enter the processing stage.

Fig. 16 A button to cancel calculation will appear when the user hovers the mouse cursor over a task in the calculation or queued state.

When the computation is complete, users can hover over the task block and click the Download button that appears to retrieve the results. The downloaded file is a ZIP archive.

Fig. 17 Viewing and downloading evolutionary data calculation results.

After the download is complete, decompress the compressed package for later use.

Prepare Module (Design Preparation Module)

Loading and Processing Structure Files

In PyMOL, users need to obtain a target structure for design. For example, fetch 1SUO retrieves the PDB structure used in this case. This is a CYP450 enzyme, which contains the following components:

Segment ID	Molecule	Description
A	protein	protein
B	HEM	cofactors
C	CPZ	Substrate molecules
D	HOH	Crystallographic water

Before we begin, we perform structure and session preprocessing. Paste the following code into PyMOL command prompt:

 # Display alpha-helices in cartoon mode as cylinders
set cartoon_cylindrical_helices, 1
# Set cartoon color to gray
set cartoon_color, gray70
# Set cartoon transparency to 0.3
set cartoon_transparency, 0.3
# Assign secondary structure
dss
# Remove crystallographic water
remove resn HOH
# Set background color to white
bg_color white

Then save this session, as it will serve as one of the starting files for our subsequent analysis and design.

Note To simplify the demo, no structure relaxation or energy minimization will be performed on this structure.

Fig. 18 Prepared Demo Structure Model.

Although the structure has been successfully loaded in PyMOL, it still needs to be loaded once in REvoDesign so that the program can recognize the molecules within the session. In the menu bar, select File → Import PyMOL Session, or use the shortcut Ctrl/Command + N to import the structure information.

Fig. 19 Importing an existing Session in PyMOL.

1.3.2. Identification of design hotspots in combined regions

The Functional Pocket section of the Prepare tab provide a shortcut of specifying the substrate and cofactor molecules, set the maximum contact distance, and defining the design hotspot region within the binding site.

This input field also allows custom input in complex PyMOL selection syntax (e.g., r. UNK or r. LIG to treat two ligands as a single entity).

Fig. 20 Designing Hotspot Region.

For example, CPZ is the substrate molecule of 1SUO, and HEM is the cofactor. The default binding pocket cutoff distances are 8 Å for the substrate and 7 Å for the cofactor. Users can specify longer or shorter distances according to their design strategies.

Fig. 21 Specifying Small Molecules, Hotspot Distance Criteria, and Save Path.

Additionally, the Check button can only be enabled after specifying the session save path. Clicking the Check button loads the pocket results into the session and saves them to the specified location.

Fig. 22 Determination Results.

The binding pocket determination results are saved in the pockets folder within the current directory, named as [molecule]_[pocket_selection]_residues.txt. Since cofactors and substrates are usually in close proximity, during the region definition process, overlapping areas are assigned to the cofactor and subtracted from the substrate region.

Pocket selection	Content	Illustrate
design_shell_CPZ_8.0_01	100,101,102,103,104,105,108,206,209 ,218,365,477,478	Substrate binding region, excluding the cofactor binding region
pkt_cof_HEM_7.0_01	88,95,96,97,98,113,114,115,116,121,125, 128,132,179,182,291,294,295,296,297, 298,299,300,301,302,303,304,305,306, 307,357,362,363,366,367,368,369,390, 392,427,428,429,430,431,432,433,434, 435,436,437,438,439,441,442,443,445, 446	Cofactor binding region
pkt_CPZ_8.0_01	98,100,101,102,103,104,105,108,114, 115,206,209,218,294,295,296,297,298, 299,300,301,302,303,362,363,365,366, 367,368,428,429,434,436,437,438,439, 442,477,478	Substrate binding region
pkt_hetatm_8.0_01	84,88,95,96,97,98,99,100,101,102,103, 104,105,108,112,113,114,115,116,117, 121,122,124,125,126,128,129,132,175, 178,179,182,183,206,209,218,291,294, 295,296,297,298,299,300,301,302,303, 304,305,306,307,353,357,361,362,363, 364,365,366,367,368,369,388,390,392, 396,427,428,429,430,431,432,433,434, 435,436,437,438,439,440,441,442,443, 445,446,477,478	Combined substrate and cofactor regions

Determining Surface-Exposed Design Hotspot Regions

In the Surface Exposure section of the Prepare tab, surface-exposed design hotspot regions can be determined.

This function identifies solvent-exposed sites based on the solvent-accessible surface area of individual residues in 3D space. For example, residues with a surface area greater than or equal to 15 Å² are considered exposed sites.

Additionally, if surface-exposed regions are close to the substrate pocket, the substrate pocket region can be excluded. By clicking [Refresh Selection], the Exclusion dropdown menu displays a series of PyMOL selections, including those generated during the pocket hotspot identification step. Selecting pkt_hetatm_8.0_01 will exclude all residues in contact with heteroatoms.

For multimeric proteins, protein-protein interface (PPI) recognition can also be performed in this section. Set Chain Dist as the minimum PPI distance, and click [Find] to select inter-chain contact regions. These can also be loaded by clicking [Refresh Selection].

Note that exclusions can also be combined using PyMOL selection strings for more precise constraints.

Fig. 23 Surface-Exposed Region Options and PPI Region Determination Options.

Fig. 24 Enter Exclusion Options, Surface Area Threshold, and Save Location to Perform Surface Hotspot Search.

The surface-exposure determination results are visualized in three parts: blue spheres represent exposed sites, red spheres represent buried sites, and sites without spheres are excluded.

The surface-exposure determination results are saved in the surface_residue_records folder within the current directory. The file contents are similar to those of the pocket determination results.

Fig. 25 Surface-Exposed Region Determination Results.

It should be noted that the session after surface-exposure determination is for visualization purposes only and should not be used for subsequent design or analysis.

Performing Constrained Virtual Saturation Mutagenesis in the Mutate Tab

Surface Entropy Reduction Design
Surface Entropy Reduction (SER) design primarily replaces surface-exposed residues with shorter, simpler amino acids that have reduced solvent interactions, while applying conservation constraints at the site level.

Users need to load the extracted evolutionary data files into the Profile section and select the profile type as PSSM. In Residue ID, select the results from the surface site detection, and choose a session save location. In the [Score cutoff] area, set an appropriate PSSM score threshold. In the figure, this means that the PSSM value of the substituted residue relative to the wild-type residue should not be less than -2 and not greater than 20. (Since PSSM values are exponential, 20 represents an extremely high difference, effectively absolute conservation.)

In the Substitution section:
- Enter Reject as PC to exclude proline and cysteine substitutions.
- In Accept, specify substitution preferences in the format {residue to be replaced}:{candidate amino acids}. For example, E:DATY indicates that the selected glutamate (E) can be replaced with D, A, T, or Y.
Finally, enter a Design Case identifier, which will be used to name the generated intermediate files.

Note: The Run button is only enabled if the session file path is valid.

Click Run! to generate an initial pool of virtual saturated mutants under the user-specified constraints.

Fig. 26 Settings for Surface Entropy Reduction Design.

The design results are displayed in PyMOL as grouped entries. Groups are organized by design site number, with the format: mt_[WildType][ResidueNumber]_[WildTypePSSMScore];

Each group contains all possible point mutations for that site. Individual point mutations are named as: [ChainID][WildType][ResidueNumber][Mutant]_[MutantScore]. Only the mutated side chain is displayed in the structure; carbon atoms are colored according to the PSSM score. The full mutant structure PDB files can be found in the mutant_pdbs folder in the working directory.

Fig. 27 Visualization of Surface Entropy Reduction Design Results.
Catalytic Pocket Design
The design of the catalytic pocket adopts a relatively flexible substitution strategy.

Users need to load the decompressed evolutionary data file into Profile, select Profile type as PSSM, and in Residue ID, choose the results of the catalytic pocket detection.

Then, specify a valid session save path.

In the Score cutoff section, set an appropriate PSSM score threshold.

The example in the figure shows that the PSSM value of the substitution residue, compared with the wild-type residue, should be no lower than -5 and no higher than 20 (since PSSM scores reflect logarithmic likelihoods, a value of 20 represents an extremely high difference, effectively indicating infinite conservation).

In the Substitution area:
- Set Reject to PC to exclude substitutions from/to Proline (P) and Cysteine (C).
- Clear any settings in Accept to allow broader DOF of substitution options.
Finally, in the Design Case field, enter a unique case name, which will be used as the basis for naming the generated intermediate files.

Compared to surface entropy reduction design, catalytic pocket design uses loosen constraints on non-conservative substitutions, allowing diversity in the designed variants.

Fig. 28 Catalytic pocket design settings.

Fig. 29 Catalytic pocket design results in visualization.

Evaluate module (Rational Evaluation Module)

The Rational Evaluation Module is an auxiliary tool for manually inspecting mutant side-chain substitutions.

It provides real-time visualization of modeled mutant side-chain conformations on the main-chain structure, allowing comparison of structural and physicochemical differences before and after mutation.

In the Evaluate tab,

the upper section specifies the save path for the selected mutants and the checkpoint loading entry;
the lower section contains controls for switching mutant inspection (previous/next buttons) and making decisions (accept/reject buttons).

Note: If the save path is not specified, no mutant files or checkpoint files will be generated.

Fig. 30 Saving decisions and loading checkpoints in the Rational Evaluation module.

In REvoDesign, mutants are organized into groups called a "mutant tree."

Each branch appears in the first drop-down menu, and the current mutant appears in the second drop-down menu.

By selecting these two menus, users can navigate between branches and specific mutants.

Additionally:

Find the Best Hit allows you to search for the highest-scoring mutant within the current branch.
I'm Lucky! automatically scans all branches to locate the best-scoring mutant in the entire tree.

Fig. 31 Status display and decision-making tools in the Rational Evaluation module.

The left side shows the total number of mutants and the number of selected mutants, while the right side provides tools to assist in inspection and decision-making.

Before performing rational inspection, users need to click "Initialize" to identify the mutant tree in the PyMOL session.

If the identification is passed, Total should display a non-zero positive integer, and the inspection and decision buttons on the right will be enabled.

Once REvoDesign enters the rational evaluation state, unrelated point mutations are hidden, irrelevant branches are collapsed, and only the currently selected branch and individual are displayed.

The visualization shows both stick-and-ball and mesh models of the mutant side chains, alongside the wild-type side chain in a line model for comparison.

After initialization, the first mutant in the first branch is displayed by default.

Fig. 32 Rational Evaluation State.

Users can make rational decisions based on the side-chain visualization, comparing PSSM scores and side-chain substitutions to decide whether to accept, ignore (not accepted), or reject (previously accepted) a mutant.

Fig. 33 Selecting the Highest-Scoring Mutant.

Fig. 34 Decision status updated after acceptance.

Manual inspection typically uses the previous/next buttons for navigation and also supports keyboard shortcuts, improving efficiency when screening large numbers of mutants.

Button	Function	Keyboard shortcut
Previous	Back	Shift+Opt+`[`
Next	Forward	Shift+Opt+`]`
Reject	Reject	Shift+Opt+`+`
Accept	Accept	Shift+Opt+`-`

As the button navigation is linear and does not allow flexible jumping, we provide an alternative drop-down menus to allow a more efficient way to switch and select mutants.

Fig. 35 Drop-down menu for mutant selection.

Fig. 36 Selecting a Point Mutation within a Branch.

"Find the Best Hit" automatically locates the highest-scoring mutant within the current branch, while "I'm Lucky" automatically searches for the top-scoring individual in each branch.

Fig. 37 Entering a Branch.

Fig. 38 Clicking "Find the Best Hit" to Automatically Navigate to the Top-Scoring Mutant.

Fig. 39 "I'm Lucky" Automatically Searches for the Top-Scoring Mutant in 15 Branches.

During rational screening, the user's decisions are saved in real time to the specified mutant table file.

At the same time, corresponding checkpoint files are generated, allowing historical decisions to be loaded later.

Fig. 40 Real-Time Storage of Mutant Decision Results.

Fig. 41 Mutant Decision Checkpoints.

Fig. 42 Loaded Checkpoint Results.

Before loading a checkpoint file, the mutant tree should be re-initialized to clear any previous screening results.

Fig. 43 Log Display of Checkpoint Loading Results.

Clustering

REvoDesign uses sequence clustering to classify similar point mutations and select representatives from each branch.

When Mutate Relax evaluation is disabled, representatives are selected randomly.
When enabled, selection is guided by Rosetta scoring

Users need to:

Load the previously saved point mutation list.
Specify the number of mutations per mutant (default is 1).
Set the number of clusters (must be less than the total number of mutants).
Define the batch size for minibatch computation.
Choose a scoring matrix (default is PAM30).
Decide whether to perform additional energy evaluation using Mutate Relax, depending on computational needs and the local Rosetta setup.

Fig. 44 Log Display of Clustering Results.

Fig. 45 Log Display of Clustering Results.

The matrix on the right represents the pairwise sequence similarity between mutants, with darker colors indicating higher similarity.

Users should note that fewer clusters are not always better. When the number of clusters is too small, the scoring matrix may force completely different sequences into the same cluster, ignoring their diversity.

Additionally, users can choose to perform a quick energy evaluation of each point mutation within a branch using Rosetta.

Fig. 46 Log Display of Quick Energy Evaluation Results.

Fig. 47 Summary of Scoring Results in the Log Output.

Fig. 48 Mutate Relax Scoring Result File.

The complete scoring results are saved as Excel and CSV files, facilitating subsequent analysis.

Note that Mutate Relax operates under three assumptions:

The original structure is already in an energy-minimized state.
Introducing a point mutation does not affect the backbone coordinates.
The mutation does not affect long-range side-chain packing.

Under these assumptions, only the mutation is introduced, followed by a local repacking to adjust the side-chain conformations.

Therefore, having a well-optimized original structure is critical for the reliability of the scoring results of mutant structures.

Visualize Module

This module has two main functions:

Cross-data filtering of saved mutant lists, such as ddG, ESM1v, Cartesian ddG, MPNN, and other metrics.
Structural visualization of experimentally measured data within the 3D protein context.

Cross-Filtering
In this tutorial, Pythia-ddG is used as an example. It is a structure-based tool for predicting the free energy change (ΔΔG) of point mutations.

Users can conveniently perform calculations on BioLib: Pythia-ddG.

Using Pythia-ddG is straightforward:
- Upload the structure file.
- Click "Run" and wait approximately one minute for the calculation to complete.
- Download and unzip the results. The included CSV file serves as the reference data for this round of cross-filtering.
Fig. 49 Pythia-ddG Calculation Results.

Fig. 50 Pythia-ddG Requires a PDB File as Input.

Fig. 51 CSV File Containing the Pythia-ddG Calculation Results for This Example.

Fig. 52 Example of Cross-Filtering.

Load the point mutation list and specify a save location. Then, select the CSV output path from Pythia-ddG and set the Profile type to CSV (automatically detected, but manual confirmation is recommended).In the color preset area at the top right, check the option indicating that lower scores correspond to better mutants (e.g., for ddG, a value >0 indicates instability, so lower values are more favorable). In the display options at the bottom right, check Global Scoring, meaning that the color mapping considers score values across the entire CSV table. Finally, enter an appropriate Group name to generate a corresponding mutant tree for this dataset.

Additionally, for a small number of point-mutation side-chain modeling tasks, it is recommended to use high-performance side-chain modeling tools such as DLPacker. This ensures that side-chain conformations are modeled accurately and reasonably during cross-filtering, providing detailed structural information for subsequent manual inspection.

Fig. 53 Adjusting Side-Chain Solver Settings.

Fig. 54 Side-Chain Visualization During Cross-Filtering.

Fig. 55 Manual Inspection Step During Cross-Filtering.

The red mutation, AE240Y, has a high energy and is one of the primary point mutations to exclude. Click on the right-side panel in PyMOL to hide AE240Y.

Fig. 56 First Step in Pruning the Mutant Tree: Click to Hide.

Use the "Reduce Session" button to remove point mutations from the interface. Then, rename the point mutation list and click "Save Mutant" to save the pruned mutant tree.

Fig. 57 The Pruned Mutant Table No Longer Contains AE240Y.

Fig. 58 AE240Y is no longer in the mutant list after pruning.

Visualization of Experimental Data on Structure

Assume the user has a set of experimentally measured data, saved in a CSV or Microsoft Excel file. The data format is as follows:

mutant	normalized	group
WT_1	0	control
wt_2	-0.1	control
WT	0	control
AE93D	0.1	low
AK191R	0.2	medium
AQ204E	0.3	high

Here, mutant refers to the name of the point mutation, group is the classification used as the branch name in the mutant tree, and normalized is the measured score. Column names do not need to follow specific strings, as they can be adjusted in the display settings beforehand.

Mutants whose names contain WT (case-insensitive) are treated as controls, and their corresponding group will be ignored. The WT score for each mutant will be set to the average value of all controls.

Fig. 59 Settings for Displaying Experimental Results.

After adjusting the settings, click "Run" to display the results.

In the "Mutants" field, specify the location of the CSV file.
In "Save as", select the path to save the PyMOL session.
Leave the Profile path empty and set Profile type to none.
In the display options below, select from the dropdown menus:

Group: the column name for classification in your table
Mut: the column name for mutant names
Score:the column name for the measured values to display

Fig. 60 Mapping experimental data onto the structure.

Interact Module (Co-evolution Analysis Module)

In REvoDesign, co-evolution data analysis relies on the GREMLIN output. In this case, the co-evolution profile (AKA Markov Random Field, MRF) is stored in the file: gremlin_res/1SUO_A.i90c75_aln.GREMLIN.mrf.pkl.

Users need to go to the Interact tab and provide the path to the GREMLIN output and the optional saving path for mutation designs. Then, adjust the filtering thresholds, such as the top N co-evolving residue pairs based on signal strength, maximum contact distance, and chain-binding status in homologous multimers. If necessary, specialized scoring functions can be applied for point mutation evaluation.

The interface for analyzing global and local co-evolution residue pairs is largely the same. The main difference is that for local co-evolution analysis, the target residue must be selected in PyMOL. This creates a selection named sele in PyMOL; only when this selection is active will the local co-evolution analysis be triggered.

Click Initialize to load the GREMLIN archive and generate a 2D contact map showing the co-evolutionary signal strength between residue pairs.

Global Co-evolution Residue Analysis

Fig. 61 Interface for Co-evolution Analysis.

After the co-evolution calculation archive is loaded, click "Scan" to analyze the spatial positions of co-evolving residue pairs and the distribution patterns of all 20 amino acids.

For example, in this demonstration, the top 100 global co-evolving residue pairs were selected, with a distance filter of 20 Å. The scan results are visualized as backbone-connected sticks: blue sticks represent residue pairs, yellow sticks indicate the currently analyzed pair, and the stick thickness reflects the strength of the co-evolutionary signal.

Fig. 62 Search of global co-evolving residue pairs.

Users can click "Previous" or "Next" to step through co-evolving residue pairs. The interface simultaneously updates the MRF (Markov Random Field) matrix for each residue pair. The horizontal and vertical axes represent the 20 standard amino acids plus deletions at each residue. Each cell corresponds to a specific residue combination, with the cell color reflecting the probability of that particular substitution according to the chosen color scheme.

Every cell acts as a point-mutation design button. Clicking a cell will instantly generate a virtual mutant, which goes through a workflow including mutant construction, side-chain modeling, scoring evaluation (if enabled), or assigning the corresponding MRF score from GREMLIN, followed by grouping and display in PyMOL.

When the cursor hovers over a cell, a crosshair pointer appears, indicating the mutation being designed at the two residues. Additionally, a floating label shows detailed mutation information. Cells corresponding to the wild-type (WT) residues indicate the original residue combination for the co-evolving pair.

Fig. 63 Real-time co-evolution analysis.

Fig. 64 Designing point mutations in coevolution analysis.
Local Co-evolutionary Residue Analysis
Compared to global co-evolutionary residue analysis, local analysis has the following additional triggering conditions:

1.Click on a residue in PyMOL, which will automatically generate a selection object named sele.

2.The sele object must be in an enabled state.

For the user, a single click on the structure is sufficient. If using a PyMOL script to make the selection, it must be done as follows:
```
select sele, 1SUO and resi 298
enable sele
```
Then click "Scan" to perform the analysis for that specific residue (e.g., residue 298).

Fig. 65 Local coevolution analysis settings.

Fig. 66 Search results for local coevolution residue pairs.

Fig. 67 Designing mutations for residue 367.

Mutations designed based on GREMLIN need to be saved by clicking "Accept" in the interface, or by going to the Evaluate module for rational assessment and saving.

Other tool integrations

The integration of third-party design tools entails substantial adaptation efforts, including performance analysis, resource utilization assessment, evaluation of integration complexity, resolution of dependency conflicts, middleware integration, and incorporation into REvoDesign's built-in interfaces. The level of difficulty varies depending on the specific case

References

1. Porras, G., Chassagne, F., Lyles, J. T., Marquez, L., Dettweiler, M., Salam, A. M., ... & Quave, C. L. (2020). Ethnobotany and the role of plant natural products in antibiotic drug discovery. Chemical reviews, 121(6), 3495-3560.
2. Shao, F., Wilson, I. W., & Qiu, D. (2021). The research progress of taxol in Taxus. Current Pharmaceutical Biotechnology, 22(3), 360-366.
3. Perrot, T., Marc, J., Lezin, E., Papon, N., Besseau, S., & Courdavault, V. (2024). Emerging trends in production of plant natural products and new-to-nature biopharmaceuticals in yeast. Current Opinion in Biotechnology, 87, 103098.
4. Cravens, A., Payne, J., & Smolke, C. D. (2019). Synthetic biology strategies for microbial biosynthesis of plant natural products. Nature communications, 10(1), 2142.
5. Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., ... & Baker, D. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871-876.
6. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. nature, 596(7873), 583-589.
7. Musil, M., Jezik, A., Horackova, J., Borko, S., Kabourek, P., Damborsky, J., & Bednar, D. (2024). FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Briefings in Bioinformatics, 25(1), bbad425.
8. Weinstein, J. J., Goldenzweig, A., Hoch, S., & Fleishman, S. J. (2021). PROSS 2: a new server for the design of stable and highly expressed protein variants. Bioinformatics, 37(1), 123-125.