Background
The transition from specialized research frameworks to accessible web platforms represents a critical step in democratizing computational enzyme discovery. Prior to this cycle, our tools—while scientifically rigorous—required local installation, programming expertise, and significant computational resources. This created barriers for wet-lab researchers, environmental scientists, and interdisciplinary teams who needed quick access to predictive models and comprehensive enzyme databases.
Our core idea is to build a user-friendly platform. Through extensive consultations with researchers across microbiology, environmental science, synthetic biology, and chemistry, we identified a shared need: a unified, installation-free online environment that integrates predictive modeling with database exploration. The platform needed to serve those purposes: (1) enable rapid prediction of enzyme–plastic interactions from user-provided sequences or structures, and (2) facilitate knowledge discovery through a searchable database of curated enzyme–plastic relationships.
Core Requirements:
Model Platform Requirements
- Multi-Input Support: Accept protein sequences (FASTA format, raw text) and 3D structures (PDB files)
- Dual-Model Architecture: Integrate both sequence-based (PlaszymeAlpha) and structure-informed (PlaszymeX) predictors
- Automated Workflow: Eliminate manual structure prediction steps—users provide sequences once, system handles structure generation
- Scalability: Support large-scale metagenomic analyses with custom HMM pipelines for enzyme discovery
- Interpretability: Provide confidence scores, degradation probabilities, and interactive 3D visualization
- Export Functionality: Enable CSV export for downstream computational analysis
- Asynchronous Processing: Long-running structure predictions should not block user interface
- Global Accessibility: Deploy on cloud infrastructure for worldwide availability
Database Platform Requirements
- High-Performance Search: Implement efficient traversal-based search algorithm across enzyme properties
- Sequence Alignment: Provide BLAST-like local sequence similarity
search
3D Structure Visualization: Interactive molecular viewer for experimental and predicted structures
- Phylogenetic Context: Dynamic display of evolutionary relationships via embedded phylogenetic trees
- Chemical Information: Render plastic chemical structures with standard nomenclature
- Comprehensive Metadata: Include EC annotations, literature citations, taxonomy, and database cross-references
- Filtering & Sorting: Multi-criteria filtering (plastic type, organism, EC class, structure availability)
User interface design
Predictor Platform design

Dashboard Page
User-friendly features conducive to management
The dashboard page is designed for visualizing the project operation and basic management. Here, researcher could directly view information such as the ongoing projects, completed projects, and the total number of projects. When making the prediction input, users can customize the project name. We also used a variety of colors in the page design to distinguish the status of different tasks. Help to address the specific work management issues mentioned by the researchers.

“Model selector” function
Help users make the appropriate model selection
When conducting the Plastic Degradation Prediction task, users can select the model based on their
specific task requirements in the model selector. Different models have different
functions, and the selector provides helpful descriptions to address users' concerns.
This feature is designed to enable each model to be better selected and to truly utilize its strengths.
And this also provided feedback on the additional usage guidelines mentioned by the users during the
interview.

“Plastic selector” function
Offer plastic options to facilitate specific research
Through the plastic selector on the PlaszymeX input interface, users can select the specific type of study they are expecting to conduct. This in total provides 34 types of plastics. This function is designed to assist users in conducting precise research without generating redundant results.

Metagenome HMM Scan
Using PlaszymeHMM, we combine the Plaszyme model with the actual usage environment
In the Metagenome HMM Scan task type, users are still provided with detailed descriptions in the interface. During the task process, PlaszymeHMM will play a powerful role, which will help users handle complex Metagenome data. Potential plastic-degrading enzyme sequences will be identified and read by the HMM model, and then automatically processed as the model's input.
The design of this section is mainly based on the communication with the professor. This has enabled most of the current researchers who utilize environmental samples to utilize the model, helping them accelerate their research.

Result Page
The page for displaying and exporting the output results of the model
After obtaining the output of one's own prediction results, one can "View" this output through the Dashboard project management. Users will be able to see the degradation probabilities and Confidence for different plastic types, and can also export the results in CSV format for next stage use.
PlaszymeDB Platform design


BLAST Professional Tool And Result
Actual screenshots showing the input and output of the BLAST function
The specialized sequence alignment tool built into the database platform is a function that effectively enhances convenience. To facilitate the one-stop use by researchers, we have packaged our tool based on the BLAST algorithm and local sequence data for users. We have also set up various parameters and threshold options to facilitate diverse usage.

iTOL Embedded Visualization Tool
Screenshot of statistical tool for visualizing the phylogenetic tree embedded in the webpage
During our communication with users and the examination of the online platform, we learned about the function of embedding the iTOL phylogenetic tree display. This statistical feature enables users to visually view the data. Based on its rich content and the plastic type labels we added, users can conveniently search for and locate the information they are interested in without leaving the website. Such a feature further enhances the user-friendliness of the platform.
Design
Design Philosophy
The platform design adheres to three core principles:
- “Less is More” Interaction Design: Minimize user input requirements while maximizing output richness. Users should provide data once and receive comprehensive, multi-faceted results.
- Professional Rigor Without Complexity: Maintain scientific accuracy and comprehensive functionality without overwhelming users unfamiliar with computational tools. The interface should guide users naturally through prediction workflows without requiring manual reading.
- Modular Architecture: Design independent but complementary modules—model prediction and database exploration—that can evolve separately while sharing common data standards.
System Architecture
Plaszyme Model Platform Architecture

Key Design Features
- Isolated Environments: V1 model runs in
plaszyme_env
, X1 model runs inplaszyme_gpu
(completely isolated conda environments)
- Smart Model Routing: Automatic model selection—PDB files → X1 (structure-based), sequences → V1 (sequence-based)
- Background Processing: Long-running X1 predictions execute in daemon threads with real-time status polling
- Resource Management: Automatic cleanup triggers when project count reaches 50 (deletes oldest 20 projects)
PlaszymeDB Database Platform Architecture

Key Design Features
- Custom BLAST Implementation: PHP-based sequence similarity algorithm using sliding
window approach (avoids external BLAST+ dependencies)
- Dual Structure Repository:
Separate directories for experimental (pdb_experimental/
) and predicted (pdb_predicted/
)
structures
- Embedded Visualization: Iframe-based integration of Mol* viewer,
Ketcher chemical editor, and iTOL phylogenetic trees
- Comprehensive Metadata: EC
numbers from both literature curation and DeepEC predictions
Implementation
Plaszyme Model Platform Implementation Core Technologies
Component | Technology | Specification |
---|---|---|
Backend Framework | Flask (Python 3.9+) | Lightweight WSGI application with CORS support |
Deep Learning | PyTorch 1.13+ | CUDA-enabled for GPU acceleration (optional) |
Protein Language Model | ESM-1b (650M parameters) | Pre-trained on UniRef50, 1280-dimensional embeddings |
Sequence Classifier | HistGradientBoostingClassifier | Scikit-learn implementation, supports multi-label prediction |
Structure Prediction | ColabFold (AlphaFold2) | Integrated via subprocess in isolated conda environment |
Graph Neural Network | Custom GNN Bilinear Model | Protein-plastic interaction scoring (X1 model) |
Async Task Queue | Threading + BackgroundTaskManager | Custom implementation with status tracking |
Data Format | CSV, FASTA, PDB | Standardized input/output formats |
Deployment | AWS EC2 | DCloud-hosted for global accessibility |
Technical Details
- ESM Embedding Extraction (from
utils.py
):
# ESM-1b model: 650M parameters, 33 transformer layers, 1280-dim embeddingsmodel, alphabet = esm.pretrained.esm1b_t33_650M_UR50S()
model.eval().to(device)
# Mean-pooling strategy for sequence-level representationfor batch in sequences:
tokens = batch_converter(batch).to(device)
with torch.no_grad():
output = model(tokens, repr_layers=[33]) # Last layer embeddings = output["representations"][33][:, 1:-1].mean(1)
- Background Task Management (from
app.py
):
class BackgroundTaskManager:
"""Manages async PDB predictions to prevent UI blocking"""
def __init__(self):
self.tasks = {} # {task_id: {status, progress, result, error}} self.lock = threading.Lock()
def create_task(self, task_id, project_name, task_type='pdb'):
# Thread-safe task creation with self.lock:
self.tasks[task_id] = {
'status': 'processing',
'progress': 0,
'created_at': datetime.now().isoformat()
}
- Automatic Resource Cleanup:
def auto_cleanup_old_projects():
"""Delete oldest 20 projects when count reaches 50""" if total_projects >= 50:
cycles = total_projects // 50 to_delete = cycles * 20 # Sort by creation time, delete oldest projects.sort(key=lambda x: x['ctime'])
for project in projects[:to_delete]:
shutil.rmtree(project['path'])
- ColabFold Integration:
# X1 model uses isolated 'plaszyme_gpu' conda environment# Prevents dependency conflicts between V1/X1 modelssubprocess.run([
str(PLASZYME_GPU_PYTHON), # Dedicated Python interpreter str(prediction_script),
'--pdb', pdb_file,
'--output', output_dir
], timeout=600)
Deployment & Infrastructure
AWS Deployment Architecture
- Plaszyme Model Platform:
- Instance Type: AWS EC2 g4dn.xlarge (1 NVIDIA T4 GPU, 16GB GPU memory, 4 vCPUs, 16GB RAM)
- Operating System: Ubuntu 20.04 LTS
- Web Server: Gunicorn (4 workers) + Nginx reverse proxy
- SSL/TLS: Let’s Encrypt certificate with automatic renewal
- Domain: http://plaszyme.org/plaszyme
- Storage: 100GB EBS SSD for model weights and results cache
- PlaszymeDB Database Platform:
- Instance Type: AWS EC2 t3.medium (2 vCPUs, 4GB RAM)
- Operating System: Ubuntu 20.04 LTS
- Web Server: Apache 2.4 + PHP 7.4-FPM
- Database: MySQL 8.0 (local, not RDS for cost optimization)
- Domain: http://plaszyme.org/plaszymedb
- Storage: 50GB EBS SSD for database and PDB files
Load Balancing: Both platforms currently run on single instances; future scalability can be achieved via AWS Elastic Load Balancer for horizontal scaling.
Performance Optimization
- Model Prediction:
- GPU acceleration reduces PlaszymeAlpha inference time from ~30s/100 sequences (CPU) to ~10s (GPU)
- ESM model caching: Pre-load ESM-1b weights on server startup to avoid repeated downloads
- Batch processing: Group sequences into batches of 4 for optimal GPU utilization
- Database Queries:
- MySQL query caching enabled (256MB cache)
- Composite indexes on frequently filtered columns (
plastic + ec_number
)
- LIMIT 50 on search results to reduce data transfer and rendering time
- 3D Visualization:
- Mol* viewer lazy-loading: Initialize only when structure tab is clicked
- PDB file compression: Gzip compression reduces transfer size by ~70%
- CDN for Mol* library assets (reduces load time from ~2s to ~0.3s)
- Frontend Optimization:
- Minified JavaScript/CSS bundles
- AJAX pagination for large result sets
- Debounced search input (300ms delay) to reduce API calls
Technical Innovations
- Dual-Model Integration: Seamless switching between sequence-based (PlaszymeAlpha) and structure-based (PlaszymeX) predictors based on input type, using isolated conda environments to prevent dependency conflicts.
- Automated Structure Prediction: Eliminated manual ColabFold execution—users submit sequences, system automatically generates structures for X1 model inference.
- Custom BLAST Implementation: Pure PHP sequence similarity algorithm enables BLAST-like searches without external binary dependencies, simplifying deployment on shared hosting environments.
- Asynchronous Task Management: Background thread architecture prevents UI blocking during long-running predictions (PDB structure generation ~5-10 minutes).
- Intelligent Resource Management: Automatic cleanup mechanism (delete 20 oldest projects when count reaches 50) prevents server storage exhaustion.
Lessons Learned
Design Insights
- User-Centered Design is Non-Negotiable: Early user interviews revealed that computational barriers (installation, dependencies) were greater obstacles than scientific complexity. Prioritizing web deployment from the start would have accelerated adoption.
- “Less is More” Requires Backend Complexity: Simplifying user workflows (e.g., automatic structure prediction) necessitates sophisticated backend orchestration (isolated environments, task queuing, error handling). The trade-off is worthwhile—users need not understand underlying technical details.
- Transparency Builds Trust: Providing real-time task status, confidence scores, and algorithm explanations helps users trust AI predictions, especially in high-stakes research contexts.
Technical Learnings
- Conda Environment Isolation is Critical: Initially, V1 and X1 models shared dependencies, causing version conflicts (e.g., PyTorch 1.13 vs. 2.0). Complete isolation via separate environments eliminated instability.
- Custom BLAST Trade-offs: PHP implementation is ~8x slower than NCBI BLAST but avoids binary dependencies and simplifies deployment. For databases >10,000 entries, migrating to BLAST+ would be necessary.
- Cloud Cost Optimization: GPU instances (g4dn.xlarge) cost ~$0.50/hour; using CPU for PlaszymeAlpha and GPU only for X1 reduced costs by 40% while maintaining acceptable performance.
Future Directions
- API for Programmatic Access: Enable command-line tool users to submit jobs via REST API (currently web-only).
- Enhanced Interpretability: Integrate attention visualization for ESM embeddings to show which sequence regions contribute most to predictions.
- Multi-Plastic Expansion: Extend prediction targets beyond current 15 plastic types to include emerging materials (e.g., bioplastics, composites).
- Community Contribution Portal: Allow users to submit newly discovered enzymes to PlaszymeDB after literature validation.
Conclusion
The development of the Plaszyme model platform and PlaszymeDB database represents a successful transition from specialized research tools to accessible, production-grade web applications. By integrating advanced AI models (ESM-1b, graph neural networks), automated workflows (ColabFold structure prediction), and user-friendly interfaces, we have created a “one-stop” platform that serves both computational and experimental researchers.
Key achievements include:
(1) dual-model architecture supporting both sequence- and structure-based predictions, (2) custom BLAST implementation for dependency-free deployment, (3) asynchronous task management for long-running computations, and (4) comprehensive database with 749 curated enzyme entries and integrated 3D visualization.
Explore the full interactive platform at: http://plaszyme.org/plaszyme/
Reference
- Letunic, I., & Bork, P. (2024). Interactive Tree of Life (iTOL) v6: Recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Research, 52(W1), W78–W82. https://doi.org/10.1093/nar/gkae268
- Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2