The Problem and Our Solution
Our project selection was driven by a three-fold imperative: the urgency of the global plastic pollution crisis, the bottleneck in enzyme engineering, and the maturity of modern Artificial Intelligence (AI).
We recognized that while synthetic biology offers a promising "green" solution through plastic-degrading enzymes (PDEs), the research process is fundamentally challenged. The traditional method of discovering and optimizing these crucial BioParts is a time-consuming, low-throughput 'trial-and-error' wet lab process. For researchers, the available PDE data is often fragmented, inconsistently annotated, and buried across various literature and general databases.
As a team with core expertise in Bioinformatics and AI, we realized we were uniquely positioned to address this. Our central motivation was to apply a systematic, engineering-driven logic to transform this chaotic landscape into an ordered, high-efficiency workflow. We were determined to leverage the power of AI for Science (AI4S), using advanced machine learning and deep learning models to capture the complex relationships between enzyme sequence, structure and function. Also, help to push the synbiology community development.
Giving Back to the SynBio Community
We have always recognized that our project's success is deeply rooted in the spirit of generous collaboration and open sharing cultivated by the synthetic biology and iGEM communities. Countless iGEM teams, including our own, have benefited from this ethos, allowing us to build upon the work of others and navigate a smoother development path. We truly believe that access to resources and the dissemination of knowledge can lead to great achievements. This is constantly proven by the world around us; as the old saying goes, "Unity is strength."
Throughout our public engagement activities, we were often asked about the "value" of our work. This prompted us to reflect deeply on how we could make our project valuable to future iGEM teams and the broader synthetic biology community. We hope that the work detailed here, and throughout our wiki, will offer valuable insights and help facilitate future projects.
PlaszymeDB: Our Dedicated Plastic-Degrading Enzyme Database
The cornerstone of our work is PlaszymeDB, a database we meticulously built. Early in our project, we identified that while public databases are vast, they suffer from fragmented data and inconsistent annotations, especially in the niche field of plastic-degrading enzymes. To address this critical pain point, we developed a focused, high-quality, and specialized resource from the ground up.
PlaszymeDB currently contains 474 curated sequences of verified plastic-degrading enzymes. For every entry, we have not only generated an accurate predicted structure but also integrated experimentally determined structural data (e.g., from X-ray crystallography) where available. More importantly, each entry has been carefully annotated with key information, including the type of plastic it degrades (e.g., PET, PE) and its EC number.
This database is our gift to the community. It is fully open and accessible through our WebApp, supporting online browsing, searching, and full data export and download functions. We believe PlaszymeDB serves as a rare, ready-to-use data resource that can significantly accelerate the research workflows of all iGEMers and synthetic biology researchers. Explore it at its dedicated portal: PlaszymeDB.
Two Core AI Models
Leveraging the solid data foundation of PlaszymeDB, we have sequentially engineered two distinct AI models. These innovative contributions, each with unique advantages on different metrics, act as the core intelligence and driving force of our project. Discover more on our GitLab homepage.
PlaszymeAlpha: A Sequence-Based Model for Rapid Screening
Our pioneering machine learning model, which laid the foundation for PDE prediction, utilizes the large protein language model ESM for feature extraction and has experimented with various advanced machine learning algorithms. It demonstrates excellent ranking capability on known enzyme–plastic pairs, making it ideal for prioritization and validation tasks.
PlaszymeX: A Structure-Based Model for High-Precision Evaluation
PlaszymeX, featuring a dual-tower Graph Neural Network (GNN) architecture that integrates enzyme structural features with polymer descriptors. This model provides superior generalization, supports predictions on novel plastics, and achieves the state-of-the-art performance (e.g., F1-score) across all metrics, making it the most comprehensive solution.
The Plaszyme Platform
To make our database and AI models easily accessible to all researchers, we created the Plaszyme WebApp. It integrates all the main functions of our tool suite—including running AI predictions, searching the database, and exploring screening pipelines—into a single, user-friendly interface. It is the easiest way to experience our project, requiring no setup.
Within the WebApp, users can input an enzyme's FASTA sequence, PDB file, or even metagenomic data to leverage our models and database. We believe the Plaszyme platform significantly lowers the barrier to entry for using advanced AI models, empowering more teams dedicated to solving the global plastic pollution crisis.
We invite you to visit the Plaszyme - Plastic Degradation Prediction Platform to explore and use our complete tool suite!
Educational Contributions
Recognizing that effective education must be tailored to the audience, we designed a multi-level outreach strategy targeting different age groups, from primary school students to university undergraduates. Our goal was not only to disseminate knowledge but also to create reproducible teaching frameworks for future iGEM teams.
Engaging Primary School Students with Hands-On Science: Faced with their short attention spans and difficulty with abstract concepts like "AI," we used common objects as tangible teaching aids. We designed simple, hands-on experiments, allowing them to directly observe the vitality of microorganisms. This approach transformed complex knowledge into an intuitive and fun experience, sparking their foundational interest in life sciences.
Connecting SynBio to Daily Life for Junior High Students: As students in this age group begin to develop abstract thinking, we aimed to bridge the gap between textbook knowledge and the real world. Our sessions focused on explaining concepts through relatable, everyday examples. Through guided questioning and interactive discussions, we helped them connect abstract scientific terms to their lived experiences, making the knowledge more meaningful.
Fostering Critical Thinking in High School Students: For high schoolers who are developing critical and logical thinking skills, we conducted specialized lectures on cutting-edge topics that linked synthetic biology with societal issues. We adopted participatory formats like group discussions and debates, encouraging them not just to receive information, but to scientifically analyze, question, and form their own perspectives.
Empowering University Peers with Real-World Project Experience: At the university level, we provided practical, in-depth learning opportunities. We organized an open-source Machine Learning course and held presentations for our Plaszyme project, which were integrated with real data from our research. In collaboration with university art clubs, we also co-created science communication works, promoting interdisciplinary approaches to public engagement.
Ideas for Future iGEM Teams
Our work provides a blueprint for AI-driven enzyme discovery and engineering. We encourage future teams to build upon it:
Expanding the Database: Incorporate data for other types of plastic-degrading enzymes (e.g., for polyurethane or PVC) or other industrial enzymes (e.g., amylases, lipases) into PlaszymeDB.
Model Fusion and Upgrades: Attempt to fuse sequence and structural information into a single, more powerful AI model, or train models to predict additional properties like optimal temperature and pH.
Guiding Protein Engineering: Use our platform to guide the directed evolution of enzymes by predicting which mutations are most likely to enhance performance.
Expanding Educational Outreach: Develop new hands-on modules based on our age-specific teaching frameworks, or translate our educational resources to reach a broader, global audience.
This comprehensive contribution reflects our commitment to advancing synthetic biology through innovative technologies, community engagement, and educational outreach. Our efforts aim to leave a lasting impact on both the scientific community and the broader public.