Function Overview

This tool is developed by the BIT-LLM iGEM Team of Beijing Institute of Technology. It is a web-based protein analysis tool fine-tuned on the ESM-2 35M base language model, enabling core functions such as protein sequence prediction and optimization.

Function Overview

Figure 1: Function Overview

Mode Selection

There are two usage modes available on the left side of the interface, and users can choose according to their actual needs:

Freshman Mode

Supports basic protein prediction and improvement functions. Users can operate it directly without login, and can obtain key data such as prediction results and improved score comparisons. However, it does not save historical analysis records, making it suitable for initial experience or simple analysis scenarios.

Expert Mode

Requires registration and login to use. In addition to all basic functions of Freshman Mode, it also supports inputting and analyzing longer protein sequences. The system automatically saves all analysis records for easy subsequent viewing and management, making it suitable for scenarios with continuous protein analysis needs.

Usage Methods

Before conducting protein analysis, users need to complete the following parameter configurations:

1) Select the protein to be improved

Different original proteins correspond to different initial sequences and characteristics, which will exert a certain influence on the final improvement direction.

Specific operation: Click the dropdown box labeled "-- Please select a protein --" and choose the target protein from the expanded list (optional options are as follows):

Protein Selection

Figure 2: Protein Selection

If there is no clear improvement direction or no initial protein highly similar to the target protein, the "ELSE" option can be selected, and the system will automatically carry out improvement work based on the general initial model.

2) Select protein feature

Different features correspond to different amino acid improvement directions, which will exert a certain influence on the analysis results.

Specific operation: Click the dropdown box labeled "-- Please select a protein property --" and choose the protein property to be optimized (optional options are as follows):

Protein Feature Selection

Figure 3: Protein Feature Selection

According to existing tests, the improvement performance of different features corresponding to each protein varies significantly, and some of the performances have good results, as shown below:

Performance Results

Figure 4: Performance Results

3) Input sequence (optional)

The initial sequence of the protein to be improved needs to be entered here. If the input is empty, the system will automatically default to modification based on the wild-type.

Specific operation: Enter the uppercase amino acid sequence of the protein in the "Please enter the sequence" input box in a standard legal format.

Format requirements: Each amino acid is represented by one uppercase letter; the sequence must not contain the four types of characters (B, J, X, Z); lowercase letters and non-alphabetic illegal characters are prohibited.

Analysis Results

After completing the above parameter settings, click the blue "Start Analysis" button at the bottom. The system will analyze and process the input protein information through the following steps:

1) The system improves the input sequence based on the prediction model corresponding to the selected protein
Sequence Improvement

Figure 5: Sequence Improvement

2) After the prediction is completed, the system will call the scoring function to score various features of the original sequence and the improved sequence
Scoring Function

Figure 6: Scoring Function

Scoring Results

Figure 7: Scoring Results

3) After the scoring is completed, the system will present the results in a visualized way through a Sequence Alignment Comparison Chart, Feature Score Comparison Chart, Feature Radar Chart, and Score Distribution Chart
Visualization Results

Figure 8: Visualization Results