Function Overview
This tool is developed by the BIT-LLM iGEM Team of Beijing Institute of Technology. It is a web-based protein analysis tool fine-tuned on the ESM-2 35M base language model, enabling core functions such as protein sequence prediction and optimization.
Figure 1: Function Overview
Mode Selection
There are two usage modes available on the left side of the interface, and users can choose according to their actual needs:
Supports basic protein prediction and improvement functions. Users can operate it directly without login, and can obtain key data such as prediction results and improved score comparisons. However, it does not save historical analysis records, making it suitable for initial experience or simple analysis scenarios.
Requires registration and login to use. In addition to all basic functions of Freshman Mode, it also supports inputting and analyzing longer protein sequences. The system automatically saves all analysis records for easy subsequent viewing and management, making it suitable for scenarios with continuous protein analysis needs.
Usage Methods
Before conducting protein analysis, users need to complete the following parameter configurations:
Different original proteins correspond to different initial sequences and characteristics, which will exert a certain influence on the final improvement direction.
Specific operation: Click the dropdown box labeled "-- Please select a protein --" and choose the target protein from the expanded list (optional options are as follows):
Figure 2: Protein Selection
If there is no clear improvement direction or no initial protein highly similar to the target protein, the "ELSE" option can be selected, and the system will automatically carry out improvement work based on the general initial model.
Different features correspond to different amino acid improvement directions, which will exert a certain influence on the analysis results.
Specific operation: Click the dropdown box labeled "-- Please select a protein property --" and choose the protein property to be optimized (optional options are as follows):
Figure 3: Protein Feature Selection
According to existing tests, the improvement performance of different features corresponding to each protein varies significantly, and some of the performances have good results, as shown below:
Figure 4: Performance Results
The initial sequence of the protein to be improved needs to be entered here. If the input is empty, the system will automatically default to modification based on the wild-type.
Specific operation: Enter the uppercase amino acid sequence of the protein in the "Please enter the sequence" input box in a standard legal format.
Format requirements: Each amino acid is represented by one uppercase letter; the sequence must not contain the four types of characters (B, J, X, Z); lowercase letters and non-alphabetic illegal characters are prohibited.
Analysis Results
After completing the above parameter settings, click the blue "Start Analysis" button at the bottom. The system will analyze and process the input protein information through the following steps:
Figure 5: Sequence Improvement
Figure 6: Scoring Function
Figure 7: Scoring Results
Figure 8: Visualization Results