Experimental
Pipeline:
Multi-Stage Filtering of PSI-BLAST Candidate Sequences
After obtaining 968 candidate sequences from the metagenomic database via our distributed PSI-BLAST pipeline, we implemented a rigorous three-stage filtering scheme to select the most suitable protein sequences for subsequent wet-lab experimental validation. This study allocated a total of 40 wet-lab validation slots, aiming to systematically explore the diversity of PET hydrolases.
Filtering Workflow Overview
This screening scheme is designed to progressively ensure that candidate sequences are optimal across three dimensions: structural integrity, family representativeness, and experimental feasibility. The overall workflow and resource allocation strategy are as follows:
Stage 1 Filtering:
Catalytic Triad
Integrity Verification
1. Purpose
To ensure all candidate sequences possess the complete catalytic triad essential for PET hydrolytic function, which is a fundamental prerequisite for a potential PETase.
2. Methods
Alignment Tool: MAFFT (v7.505) algorithm for Multiple Sequence Alignment (MSA).[1]
Reference Sequences: Nine experimentally verified known PETases were used as references. Their detailed information is recorded in [Table 2-1]. [2][3][4][5][6]
PET-Degrading Enzymes and Their Literature Sources
| Name | Literature source |
|---|---|
| IsPETase | Yoshida, S. et al. A bacterium that degrades and assimilates poly(ethylene terephthalate). Science 351, 1196–1199 (2016). |
| LCC | Sulaiman, S. et al. Isolation of a novel cutinase homolog with polyethylene terephthalate-degrading activity from leaf-branch compost by using a metagenomic approach. Appl. Environ. Microbiol. 78, 1556–1562 (2012). |
| dsPETase01 | Chen J, Jia Y, Sun Y, Liu K, Zhou C, Liu C, et al. Global marine microbial diversity and its potential in bioprospecting. Nature. 2024 Sep 12;633:371-8. doi: 10.1038/s41586-024-07891-2. |
| dsPETase05 | Chen J, Jia Y, Sun Y, Liu K, Zhou C, Liu C, et al. Global marine microbial diversity and its potential in bioprospecting. Nature. 2024 Sep 12;633:371-8. doi: 10.1038/s41586-024-07891-2. |
| dsPETase06 | Chen J, Jia Y, Sun Y, Liu K, Zhou C, Liu C, et al. Global marine microbial diversity and its potential in bioprospecting. Nature. 2024 Sep 12;633:371-8. doi: 10.1038/s41586-024-07891-2. |
| PES-H1 | Zimmermann, W.; Wei, R.; Hille, P.; Oeser, T.; Schmidt, J. New Polypeptides Having a Polyester Degrading Activity and Uses Thereof. EP3517608A1, July 31, 2019. |
| PES-H2 | Zimmermann, W.; Wei, R.; Hille, P.; Oeser, T.; Schmidt, J. New Polypeptides Having a Polyester Degrading Activity and Uses Thereof. EP3517608A1, July 31, 2019. |
| Kubu | Seo H, Hong H, Park J, Lee SH, Ki D, Ryu A, et al. Landscape profiling of PET depolymerases using a natural sequence cluster framework. Science. 2025 Jan 3;387:eadp5637. doi: 10.1126/science.adp5637. |
| Mipa | Seo H, Hong H, Park J, Lee SH, Ki D, Ryu A, et al. Landscape profiling of PET depolymerases using a natural sequence cluster framework. Science. 2025 Jan 3;387:eadp5637. doi: 10.1126/science.adp5637. |
Alignment Strategy: A one-to-many alignment strategy was employed to prevent consensus sequences from masking defects in individual candidates. This involved aligning each candidate sequence individually against the nine known PETase reference sequences.
Automation: A dedicated Python script was developed to automate the execution of 968 independent alignment operations.
Key Check: After each alignment, the script automatically checked whether the candidate sequence contained the complete catalytic triad (Ser-His-Asp) at the corresponding positions. Only sequences with all three key residues correctly present passed this stage.
3. Result
This step effectively filtered out invalid sequences that, despite high sequence similarity, might have an incomplete active site due to mutations or sequencing errors, establishing a high-quality foundation for subsequent screening. Ultimately, 536 sequences passed the filter and proceeded to the next step.
Stage 2 Filtering: Sequence Clustering Analysis
1. Purpose
To cluster the sequences that passed Stage 1, providing insight into the diversity distribution and phylogenetic relationships of the candidates, thereby laying the groundwork for representative selection. [6]
2. Methods
First, Clustal Omega was used for sequence similarity analysis to generate a similarity matrix. This matrix was then input into a custom-designed neighborhood analysis program. Clustering was performed using an adaptive, three-tiered threshold system for partitioning, generating clusters of sequences with high similarity. The clustering results were visualized using Cytoscape and evaluated with cluster evaluation algorithms. Parameters were adjusted iteratively until optimal clustering performance was achieved. The final optimal clustering result was obtained with the parameter set: low-medium-high similarity thresholds of 40-60-80 and a minimum branch size of 20. see more in clustering
Stage 3 Filtering:
Signal
Peptide Prediction & Representative Allocation
1. Purpose
To rationally allocate the total of 40 wet-lab validation slots by experimental expressibility and broad phylogenetic representation.
2. Resource Allocation Framework
The wet-lab resources were divided equally into two parts:
20 slots were allocated to sequences within the dominant mainstream clusters identified in Stage 2.
20 slots were allocated to the remaining sequences not assigned to these major clusters.
3. Methods
Signal Peptide Screening
All sequences passing the first two stages (including those in mainstream clusters and the remaining sequences) were uniformly subjected to signal peptide prediction using SignalP-6.0. [7]
Screening Rule: When selecting sequences for wet-lab experiments, only sequences predicted to possess a signal peptide were retained. This is critical because the signal peptide is essential for the efficient secretion and soluble expression of PETases in common expression systems (e.g., E. coli). [2] This step was a prerequisite for slot allocation.
Slot Allocation within Mainstream Clusters (20 slots)
For the several major clusters obtained, the 20 slots were linearly allocated based on the proportion of sequences (that passed the signal peptide filter) contained within each cluster. The specific allocation details are documented in [Table 2-2].
Class and Corresponding Number Data
| class | number |
|---|---|
| 54_8 | 1 |
| 140_11 | 2 |
| 36_12 | 2 |
| 11_14 | 2 |
| 264_16 | 2 |
| 86_19 | 3 |
| 342_26 | 4 |
| 42_29 | 4 |
Slot Allocation for Remaining Sequences (20 slots)
All sequences not belonging to the major mainstream clusters (and that passed the signal peptide filter) were pooled together.
A phylogenetic tree was constructed using MEGA12 software [8] with the Neighbor-Joining (NJ) method [9] to visualize the evolutionary relationships among these sequences.
The 20 slots were allocated as evenly as possible across this phylogenetic tree. This approach ensures that rare lineages, which do not belong to the dominant clusters but may possess unique evolutionary status and functional potential, also have a significant opportunity for validation, thereby maximizing the probability of discovering novel PETases.
Next Steps
The final list of 40 candidate sequences selected through this three-tiered screening strategy has proceeded to the stage of protein heterologous expression and enzymatic activity validation. This filtering strategy balances the need to focus resources on abundant families with the requirement to broadly explore the sequence diversity space.
References
1. K. Katoh et al. ,MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research30(14), 3059–3066(2002). DOI:10.1093/nar/gkf436 ↑ back
2. S. Yoshida et al. ,A bacterium that degrades and assimilates poly(ethylene terephthalate). Science351, 1196–1199(2016). ↑ back
3. S. Sulaiman et al. ,Isolation of a novel cutinase homolog with polyethylene terephthalate-degrading activity from leaf-branch compost by using a metagenomic approach. Appl. Environ. Microbiol.78, 1556–1562(2012). ↑ back
4. J. Chen et al. ,Global marine microbial diversity and its potential in bioprospecting. Nature633, 371-8(2024). DOI:10.1038/s41586-024-07891-2 ↑ back
5. W. Zimmermann et al. ,New Polypeptides Having a Polyester Degrading Activity and Uses Thereof. EP3517608A1, July 31, 2019. ↑ back
6. H. Seo et al. ,Landscape profiling of PET depolymerases using a natural sequence cluster framework. Science387, eadp5637(2025). DOI:10.1126/science.adp5637 ↑ back
7. F. Teufel et al. ,SignalP 6.0 predicts all five types of signal peptides using protein language models. Nature Biotechnology40, 1023–1025(2022). DOI:10.1038/s41587-021-01156-3 ↑ back
8. S. Kumar et al. ,MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular Biology and Evolution35(6), 1547–1549(2018). DOI:10.1093/molbev/msy096 ↑ back
9. N. Saitou and M. Nei ,The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution4(4), 406–425(1987). DOI:10.1093/oxfordjournals.molbev.a040454 ↑ back