"By virtue of his universal fiery spark of the light of nature, it is beyond doubt Proteus, the sea god of the ancient pagan sages, who hath the key to the sea and...power over all things."
—Carl Jung, Von Hyleanischen Chaos, vol.14:50
Overview
Figure 1. Interview Map
For our project PROTEUS, we recognize that developing a robust, AI-driven protein design platform must be closely integrated with ongoing Human Practices (HP). Throughout the R&D process of the PROTEUS project, the BIT-LLM team has consistently adhered to the concept of Integrated Human Practices (IHP). Our IHP process is not an isolated activity, but a pillar of our project development. We firmly believe that research in synthetic biology is not merely a technical endeavor, but also concerns social needs, ethical responsibilities, and interdisciplinary integration. Through conferences, expert interviews, and educational promotion, BIT-LLM actively seeks feedback from various stakeholders, including leading computational biologists, synthetic biology experts, industry professionals, ethicists, and the general public.
This further strengthens the foundation for scientific responsibility and public communication, enabling deep integration of technology and humanities.
iHP-Cycle
The development of the PROTEUS project follows a dynamic, iterative HP-4R cycle, consisting of four key phases that run through the entire lifecycle of our project:
Figure 2. The HP-4R Cycle
Record
We proactively step out of the laboratory and systematically document diverse feedback and insights from scientists, industry professionals, ethicists, and the public by participating in academic conferences, interviewing domain experts, organizing ethics forums, conducting public science popularization activities, carrying out street interviews, and visiting welfare houses. Examples include: recording concerns about AI abuse during ethics forums; and obtaining suggestions on integrating AI with physical models from researchers.
Reflect
After each external interaction, the team conducts in-depth internal discussions to critically examine the technical roadmap, application value, and ethical responsibilities of our project, and identify the strengths and blind spots of the current design. This includes, but is not limited to, reflecting on whether our purely data-driven model lacks physical rationality, and considering whether our platform is user-friendly enough for non-professionals.
Refine
Based on the conclusions drawn from reflection, we immediately translate external insights into actionable steps, making targeted adjustments and upgrades to PROTEUS's platform architecture, model strategies, validation processes, and safety-ethics considerations. For instance, we introduced AlphaFold 3 for structural validation in accordance with expert recommendations.
Recycle
The optimized new version of the project then becomes the basis for the next round of external interactions. We reintroduce the updated plan into broader dialogues to seek new feedback, thereby initiating a new, higher-level cycle of improvement.
This recurring HP-4R cycle ensures that PROTEUS is no longer a static technical product, but a responsible innovative outcome that continuously evolves and improves through ongoing dialogue with society.
Original Intent
Project Genesis
By organizing data from the past decade retrieved via full-text searches of PubMed for "Protein Language Model (PLM)" OR "AI-driven protein design," we clearly observe a significant increase in the number of relevant literatures in this field from 2016 to 2025, with an accelerated growth trend especially after 2022. This growth is driven by the combined forces of increased computing power, algorithmic innovation, and data accumulation, and also reflects the shift in the field's research focus from exploring basic methods to solving practical biomedical problems. Key phases are summarized as follows:
- 2016–2018: Low-level Stagnation Phase: Research was scattered, focusing primarily on exploring how to apply Natural Language Processing (NLP) techniques to protein sequence analysis, leading to the development of preliminary models such as Long Short-Term Memory (LSTM) and early Transformer architectures.
- 2019–2021: Inflection Phase: Achievements like ESM and AlphaFold boosted attention in the field, with research beginning to focus on specific applications of protein language models, including protein structure prediction, function annotation, and mutation effect analysis.
- 2022–2023: Rapid Growth Phase: PLMs began to align with industrial needs (e.g., pharmaceuticals, industrial enzymes).
- 2024–2025: Explosive Growth Phase: The number of studies doubled, entering a phase of interdisciplinary integration and application transformation. Emphasis was placed on the integration of multi-modal data (e.g., combining 3D structural information, experimental data, and biological prior knowledge) to further enhance model performance and applicability.
Figure 3. Results of a Decade-long PubMed Search for "Protein Language Model" OR "AI-driven protein design"
Overall, the field of protein language models and AI-driven design exhibits the following prominent trends:
- Integration as the Mainstream: Single-sequence models are no longer sufficient to meet complex needs; multi-modal integration of sequence, structure, function, and even physicochemical properties is the core direction of current model development.
- Focus on Experimental Closed Loops: The value of research is ultimately validated by wet experiments. The rapid iterative closed loop of "AI design → experimental validation → data feedback" has become a standard for top-tier research.
- Pursuit of Practicality and Accessibility: Technology is becoming increasingly toolized and democratized through cloud platforms and automated processes, with the goal of making these powerful AI tools accessible to a wide range of biologists.
PLM technology is in a phase of vigorous development, and the scientific community generally has demands for protein function optimization, such as improving stability, enhancing catalytic efficiency and specificity, optimizing immune regulation, and designing new functions. Therefore, after continuous discussions with our team members and instructors, we shifted our initial focus from using PLMs to optimize a specific protein to developing the PROtein Transform Engineered by Universal Software (i.e., the PROTEUS project). Through a complete and rigorous DBTL (Design-Build-Test-Learn) cycle, we ultimately validate results via wet experiments and achieve continuous iteration.
Expertise Boosts Project
The growth of our PROTEUS project was not achieved overnight, but through continuous shaping and optimization via ongoing exchanges with top experts in the field, peer teams, and the public. The timeline below details each key exchange activity, clearly demonstrating how external insights gradually guided and accelerated the maturation and improvement of the project.
1. Project Kick-off Meeting
Format: Internal meetings and mentor guidance
Figure 4. BIT-LLM Team Kick-off Meeting
Formally established the prototype of the PROTEUS project, assembled an interdisciplinary team covering biology, modeling, and design, and developed a detailed project timeline.
Mentors Huo Yixin, Guo Shuyuan, Shao Bin, and Chen Zhenya provided initial direction for the project, emphasizing the importance of innovation and interdisciplinary collaboration. They initially identified proteins such as CRISPR-Cas13a as optimization cases and recommended focusing on the software and AI track.
Regarding direction selection, Dean Huo advised team members to consider innovation and application value when choosing topics, avoid blindly following popular fields, and focus on differentiated competition to enhance the project's competitiveness. Innovation is crucial for project success and future development, and the team should actively pursue innovation in protein design.
Dean Huo required the team to advance the project according to the plan, ensure timely completion of tasks in each phase, and adjust and optimize the project direction as needed. In terms of team member development and training, emphasis should be placed on fostering leadership, communication skills, and professional expertise, with the goal of promoting personal growth through project practice.
He also highlighted the promising prospects of the intersection of computer science and biology, encouraging members to accumulate experience through the project and enhance their competitiveness in the future job market.
Regarding team collaboration and communication, mentors clarified the division of labor and responsibilities among groups, requiring enhanced collaboration between group members to advance the project collectively. They also encouraged active communication among team members to share ideas and suggestions, and resolve problems in a timely manner.
Finally, mentors requested team members to regularly summarize work progress, promptly report problems and difficulties for adjustment and optimization, and acknowledged and encouraged the team members' performance, expressing hope that the team would achieve excellent results in competitions and bring honor to the university and the team.
2. Experience Sharing by Previous Outstanding Winners
Format: Invited Fan Shuai, a member of a previous winning team, to conduct internal experience sharing and exchanges
Figure 5. Experience Sharing by Outstanding Winners
The team immediately established a weekly joint meeting system between the biology group and the modeling group, ensuring in-depth integration and information synchronization between dry and wet experiments. Based on this suggestion, we also initiated the top-level design and planning of Human Practices at the early stage of the project, laying a solid foundation for the subsequent systematic implementation of responsible innovation and public engagement.
Mastering the Overall Competition and Core Elements: The sharing systematically outlined the full process of the iGEM competition, emphasizing that the core competitiveness of a project lies in the deep integration of innovation, experimental rigor, and social value.
Optimizing Team Collaboration Models: Valuable advice was obtained on interdisciplinary team management, particularly regarding the importance of establishing an efficient collaborative mechanism between biological experiments and computational modeling.
Forward-looking Project Planning: It was clarified that Human Practices should run through the entire project rather than being added later, and that detailed documentation is critical for project demonstration and knowledge transfer.
3. Discussion with Professor Shao Bin
Format: In-depth discussion between the internal core team and the leading instructor
Figure 6. Discussion with Professor Shao Bin to Refine Project Direction
This was one of the most critical strategic shifts in the project's development. We decided to adjust PROTEUS's goal from "optimizing specific proteins such as Cas13" to "providing a universal AI-driven design platform for synthetic biology." This decision laid the foundation for all subsequent model designs, software architectures, and promotional messaging of our project.
Mentors pointed out that limiting the project to a single protein (e.g., Cas13a) might reduce initial difficulty but would greatly restrict the project's innovation and impact. They encouraged us to elevate the project's perspective, abstract the core technology, and build a universal AI-driven design platform independent of specific proteins. This aligns more with the positioning of a "universal software" and better leverages our team's strengths in computing and modeling.
4. Exchange with BIT Peer Teams
Format: Informal technical discussion with a peer team
Figure 7. Exchange and Sharing with BIT Peer Teams
This exchange inspired us to think about the future development of PROTEUS—integrating the AI software platform with automated experimental hardware to build a fully automated "Design-Build-Test-Learn" closed loop, which can serve as the project's long-term vision. It also made us pay more attention to highlighting PROTEUS's ability to solve practical biological design problems in project demonstrations, rather than merely emphasizing the advanced nature of the model.
Value of Hardware-Software Integration: The peer team demonstrated their mature automated hardware platform, which can automatically perform tests and upload results in real time, enabling seamless connection between "hardware execution" and "software analysis." They emphasized that this integration can greatly improve the efficiency and reproducibility of biological research.
Insights into iGEM Hardware Evaluation: They shared valuable competition experience: iGEM judges evaluate hardware projects not only based on technical maturity but also on the degree of integration with core biological problems and the ability to actually solve biological bottlenecks—rather than mere technical accumulation.
Importance of Project Implementation: Based on feedback from their participation last year, they pointed out that the project needs to demonstrate a clear and feasible implementation path, which is crucial for enhancing the project's credibility and impact.
5. Exchange with a High School Team
Format: Providing experimental protocol guidance and answering questions for the peer high school team
Figure 8. Exchange with a High School Team
This exchange prompted us to consider experimental constraints more thoroughly when designing PROTEUS's algorithms—for example, prioritizing the generation of protein sequences that are easy to synthesize, express, and validate. It also reinforced the importance of communication between dry and wet experiment personnel within the team, ensuring that AI-designed sequences can be successfully validated in subsequent experiments and avoiding disconnection between design and practice.
Solidifying Basic Experimental Principles: When answering the high school students' questions about basic experiments such as PCR and primer design, we indirectly reinforced our team's understanding of core molecular biology principles.
Recognizing Experimental Complexity: The detailed challenges encountered by the high school team in experimental operations reminded us that wet experiments are full of uncertainties and technical barriers. AI model design must take into account feasibility and fault tolerance in real laboratory environments.
6. HP Sharing Lecture
Speaker: Gao Lu, Director and Associate Researcher, Center for Science, Technology and Society Studies, Institute for the History of Natural Sciences, Chinese Academy of Sciences
Figure 9. HP Sharing Lecture
We reorganized and optimized our existing HP activity records to ensure they meet the core standard of "mutual shaping." This methodology directly guided our questioning strategies and reflection perspectives in subsequent interactions with experts and the public, enabling us to consciously guide dialogues and seek feedback that could substantially optimize the project.
Clarifying Core HP Concepts: The lecture clarified that the core of Human Practices in iGEM is to "study how your work affects the world and how the world affects your work." Its key lies in fostering "ideological exchange" and "mutual shaping" between the project and society, rather than one-way education or promotion.
Mastering the HP Cycle Method: We learned the standard Human Practices Cycle—through continuous interaction, reflection, and integration, translating external feedback into specific actions for project improvement. This provided a methodological framework for us to systematically conduct HP work.
Distinguishing HP from Outreach: The lecture clarified a common misconception: not all external activities qualify as HP. Only interactions that actually influence the project's purpose, design, or implementation count as Integrated Human Practices. This helped us plan and document subsequent HP activities more accurately.
7. Lecture on AI and Biosecurity
Speakers: Chen Bokai, Young Researcher, Center for Global Biosecurity Governance, China Foreign Affairs University; Xue Yang, Professor, Law School, Tianjin University
Figure 10. Lecture on AI and Biosecurity
This lecture prompted us to immediately conduct rigorous screening of toxic and pathogenic sequences in PROTEUS's protein database, reducing abuse risks at the source. We also began planning to embed a usage log function in the software platform to enable tracking and auditing when necessary. AI and biosecurity compliance were explicitly designated as core components of project design and Wiki demonstration.
Understanding Global Governance Frameworks: Gained in-depth insights into cutting-edge trends and challenges in global biosecurity governance, and recognized the new biosecurity risks posed by AI in protein design (e.g., generation and abuse of harmful sequences).
Learning Domestic Regulatory Policies: Professor Xue Yang systematically interpreted domestic laws and regulations related to human genetic resources and biotechnology R&D, emphasizing the importance of conducting compliance assessments in the early stages of the project.
Acquiring Cutting-edge Mitigation Strategies: The lecture introduced technical and policy tools for addressing AI biosecurity risks, such as "unlearning" technology, digital watermarking, and autonomous AI agent monitoring, providing concrete ideas for building a responsible technology platform.
8. Exchange with YNNU-China
Format: Face-to-face technical exchange with the graduate team of Yunnan Normal University (YNNU-China)
Figure 11. Exchange with YNNU-China
Inspired by this exchange, we immediately planned to integrate AlphaFold 3 structure prediction into our design process as a key validation step for AI design results, enhancing the reliability and persuasiveness of our research outcomes.
Understanding Tool Integration Paths: The team is committed to integrating existing protein design tools to build an all-in-one software platform, significantly improving the efficiency of the design process—representing a practical and efficient technical path in the field.
Clarifying Our Innovative Positioning: Through comparison, we clearly identified that PROTEUS's core advantage lies in exploring innovative protein design paradigms (i.e., using language models for targeted modification) rather than tool integration.
Identifying Key Validation Bottlenecks: The exchange revealed a common challenge faced by innovative design paradigms—lack of efficient and reliable validation methods, which directly affects the credibility of design results.
9. Roundtable Dialogue: Generative AI and Life Sciences
Format: Industry roundtable discussion
Figure 12. Roundtable Dialogue: Generative AI and Life Sciences
This insight directly prompted us to designate the integration of an automated experimental platform as a key task in the next phase. The goal is to enhance the scale and quality of wet experiments, providing continuous and reliable data feedback for the PROTEUS model to continuously improve its accuracy and reliability.
Recognizing the Core Role of Data Closed Loops: Industry experts unanimously emphasized that the implementation of AI in life sciences relies on a "data-driven" closed loop—wet experiment validation data must be fed back to the model to achieve precise design and continuous optimization.
Addressing Our Own Data Bottlenecks: We realized that our team's current wet experiment capabilities are the main bottleneck in building this closed loop, making it difficult to generate the high-throughput data required for model iteration.
10. Self-organized Meeting: AI-driven Enzyme Design Tools for Biosecurity
Format: Thematic seminar focusing on biosecurity in AI protein design
Figure 13. AI-driven Enzyme Design Tools for Biosecurity
The content of this meeting prompted us to conduct in-depth reflection on the ethical responsibilities of AI for Science. We immediately established a dedicated security section in the project Wiki, detailing potential risks and response strategies, and firmly embedding the concept of responsible innovation into project demonstrations.
Recognizing AI's Dual-Edged Sword Effect: The meeting clearly stated that unconstrained powerful AI generation capabilities may be abused to design harmful biological agents, posing serious biosecurity risks.
Learning Cutting-edge Mitigation Technologies: The meeting systematically introduced three collaborative mitigation strategies: "unlearning" technology (to remove dangerous knowledge at the source), "digital watermarking" technology (for output tracking), and "autonomous AI agents" (for real-time risk monitoring).
11. Exchange and Learning with Multiple AI & Software Track Teams
Participating Teams: Nanjing University, Xi'an Jiaotong-Liverpool University, Jilin University, and Tongji University
Figure 14. Exchange and Learning with Multiple AI & Software Track Teams
- Strengthened Technical Confidence: This exchange confirmed that the technical route we selected—fine-tuning based on ESM2—is one of the mainstream and cutting-edge directions in the field.
- Promoted Methodological Learning: The workflow of Jilin University's team in antimicrobial peptide design and the strategy of Tongji University's team provided valuable references for optimizing the specific implementation details of PROTEUS.
- Reinforced Implementation Awareness: The emphasis placed by all teams on experimental validation prompted us to think more deeply about the verifiability of PROTEUS platform outputs.
Nanjing University Team: Develops language model-based gene mining tools, aiming to break free from sequence homology limitations and integrate high-dimensional information for function determination.
Xi'an Jiaotong-Liverpool University Team: Focuses on building multi-modal large models for biology, using language models as backbone networks to uniformly process multi-modal data.
Jilin University Team: Specializes in antimicrobial peptide databases and design, using large models to screen antimicrobial peptides with specific functions from massive data.
Tongji University Team: Focuses on enzyme activity prediction and modification, using pre-trained models to obtain sequence embeddings, then training simple regression models to quickly predict enzyme property parameters.
Identifying Common Challenges: All teams face the core bottleneck of "experimental validation."
Clarifying Our Own Positioning: We gained a clearer understanding of the unique value of PROTEUS as a "universal protein optimization platform"—not limited to specific proteins or functions, but providing a universal AI-driven design solution.
12. Email Interview with Senior Researcher Huang Niu
Format: Expert interview via email
Figure 15. Reply from Senior Researcher Huang Niu
We re-planned the validation process after model output, adding an evaluation step based on physical computing. We expanded the definition of "successful optimization" and began building a more comprehensive protein evaluation system. We also placed greater emphasis on data cleaning and quality, and started exploring effective strategies for using small models with limited resources.
Integration of AI and Physical Models: Proposed a "hierarchical coupling" paradigm—AI for rapid initial screening and physical models (e.g., molecular dynamics, free energy perturbation) for detailed evaluation.
Improvement of Evaluation Systems: Pointed out that evaluating protein variants requires introducing more physicochemical properties (e.g., entropic effects, interface flexibility) rather than just macroscopic properties.
Computational Validation: Recommended embedding computational validation modules (e.g., rapid energy optimization, short-term MD simulations) into the platform to ensure the physical rationality of results.
Importance of Data: Emphasized that high-quality data is the foundation of the model, reminding us to invest significant effort in this area.
13. Mid-term Project Review Meeting and Collective Guidance from Mentors
Format: Internal mid-term progress report and collective review meeting with the mentor group
Figure 16. Mid-term Project Review Meeting and Collective Guidance from Mentors
- Accelerated Project Integration: After this meeting, the team immediately adjusted development priorities, focusing on resolving platform integration issues to ensure smooth connection between the software platform, model services, and user interface.
- Optimized Resource Allocation: Based on mentors' suggestions, the team re-evaluated human and time resources, strengthening personnel allocation for wet experiments to ensure that key experiments proceed as planned.
Obtaining Comprehensive Technical Roadmap Confirmation: The mentor group fully affirmed the core technical roadmap, including fine-tuning of protein language models based on ESM-2, AlphaFold structural validation, and the dual closed loop of dry and wet experiments.
Clarifying the Platform-oriented Development Path: Mentors emphasized that we should always adhere to the positioning of a "universal AI-driven protein optimization platform."
Resolving Key Bottleneck Issues: For technical bottlenecks mentioned in the report, the mentor group provided specific resource allocation suggestions and technical solutions to help the team overcome obstacles.
14. Interview with Researcher Xia Yan
Format: Online forum with a distinguished expert
Figure 17. Interview with Researcher Xia Yan
This interview strengthened our confidence in adopting the "pre-training + fine-tuning" strategy. It accelerated our design of the "experimental validation → model iteration" closed loop and provided a clear evolution roadmap for the long-term development of the PROTEUS platform (including NLP interaction and AI agents).
Integration of Language Models and Biological Sequences: We gained a deeper understanding of the similarities between language model architectures and biological sequences, enabling effective processing by Transformer-based models.
Application of Pre-training and Fine-tuning Strategies: Learning universal features through large-scale pre-training and then fine-tuning for specific targets can significantly improve model performance.
Importance of Data Processing: High-quality data processing and parameter adjustment are key to improving model performance.
Critical Role of Experimental Validation: Experimental validation is the gold standard for evaluating model performance.
Future Directions: We will explore developing the platform into an NLP-enabled project, allowing users to describe functional requirements in natural language.
15. Participation in Tsinghua iGEM "From Molecules to Ethics" Forum
Format: Interdisciplinary ethics forum and youth roundtable
Figure 18. Tsinghua iGEM "From Molecules to Ethics" Forum
The team's understanding of responsible innovation reached a new level. We immediately implemented specific security measures: screening toxic sequences in the protein database and planning to add usage record management to the software platform. In project introductions and external promotions, we also placed greater emphasis on explaining technical principles and security measures.
Experts such as Researcher Yu Yang systematically elaborated on the new ethical and security challenges posed by the application of AI in synthetic biology.
Learned about governance frameworks such as the Tianjin Guidelines for Scientists' Biosecurity Code of Conduct.
In the youth roundtable, we exchanged ideas with representatives from other teams on project inspiration and ethical considerations. The public generally held an "open but prudent" attitude toward the technology.
16. Interview with Researcher Zhang Shouyue
Format: Online forum with a distinguished expert
Figure 19. Interview with Researcher Zhang Shouyue
We finally confirmed the technical route based on the ESM2 large model and abandoned the alternative plan of developing an independent small model. We plan to introduce more functional indicators into the evaluation function in subsequent versions and designate RNA design as an important future expansion direction for the platform.
Model Selection: Strongly recommended fine-tuning based on universal large models rather than developing small models to avoid falling into local optima.
Experimental Closed Loops: Re-emphasized that high-throughput experimental data is the prerequisite for building an effective reinforcement learning feedback loop.
Evaluation Dimensions: Proposed expanding the protein function evaluation system to include more dimensions such as subcellular localization and protein-protein interaction properties.
Future Directions: Pointed out that RNA and RNA-protein complex design will be the next disruptive direction.
17. Science Popularization Exchange at the Affiliated High School of Beijing Institute of Technology
Format: Public science popularization and interviews
Figure 20. Science Popularization Exchange at the Affiliated High School of Beijing Institute of Technology
This exchange directly prompted us to use more visual and metaphorical language in PROTEUS's software interface and project Wiki to improve understandability and user-friendliness for non-professionals. Students' concerns about technology security reinforced our determination to take "responsible innovation" as one of our core narratives.
Gaining Inspiring Public Perceptions: Students used metaphors such as "Life's Lego" to describe synthetic biology and "super navigation" to describe the role of AI in scientific research. These vivid analogies provided valuable inspiration for explaining complex technologies to the public.
Insights into the Technical Ethics of the Younger Generation: The students demonstrated admirable prudence, clearly expressing concerns about synthetic biology "opening Pandora's box" and warnings about AI leading to the degradation of human scientists' thinking.
Clarifying the Value and Direction of Science Popularization: The students' strong interest in solving real-world problems confirmed that linking technical value to concrete social needs is an effective way to stimulate public interest and understanding.
18. Exchange with Xi'an Jiaotong-Liverpool University
Format: Online meeting involving project demonstrations and code-level technical discussions
Figure 21. Exchange with Xi'an Jiaotong-Liverpool University
- Inspired Model Optimization Ideas: Their multi-task learning-based model design prompted us to consider evolving PROTEUS's single scoring function into a multi-task optimization framework capable of predicting multiple functional properties simultaneously.
- Promoted Technical Sharing: Both teams agreed to continue exchanges after the competition—especially regarding the application of reinforcement learning in protein optimization.
- Reinforced Open-Source Determination: This exchange strengthened our resolve to fully open-source PROTEUS's core code and models to better contribute to the synthetic biology community.
Learning Professional Multi-task Model Architectures: The XJTLU team focuses on antimicrobial peptide design, with their model adopting a refined "pre-training + multi-task sequential fine-tuning" strategy.
Learning from Comprehensive Evaluation Indicator Systems: The multi-dimensional evaluation system they built (covering activity, toxicity, and stability) aligns closely with our goal of building a more comprehensive protein optimization evaluation system.
Experiencing Best Practices in Open-Source and Collaboration: The team made all their models, code, and datasets fully public, setting an example for our team's open-source practices.
Clarifying the Feasibility of Pure Dry Experiment Projects: As a pure dry experiment team, their achievements enhanced our confidence in focusing on building powerful software tools with limited resources.
19. Interview with Researcher Xu Chunfu
Format: Online expert symposium featuring project presentations and in-depth technical discussions
Figure 22. Interview with Researcher Xu Chunfu
The recommendations reinforced our confidence in building a "universal platform" while enabling a more precise definition of its core advantage as generalization capability within the field of protein modification and optimization. An immediate re-evaluation and optimization of the dataset splitting strategy was initiated to ensure the rigor of model evaluation. Active investigation into outsourcing gene synthesis and high-throughput screening solutions commenced, listing them as key alternative pathways to enhance project validation capabilities.
- Validation of Core Direction and Clarification of Capability Boundaries: Researcher Xu affirmed the value of the technical approach centered on ESM2 for protein optimization. He precisely indicated that protein language models are more suitable for the modification and optimization of natural proteins rather than purely de novo design. This provides a solid theoretical foundation for the platform positioning of PROTEUS.
- Identification of Data Bottlenecks and Optimization Priorities: It was explicitly pointed out that the limited fine-tuning dataset is a key factor affecting model performance. Furthermore, he emphasized that the strict separation of the development set and the test set to minimize sequence homology during model training is paramount for preventing overfitting and enhancing the model's generalization capability.
- Feasible Pathways for Establishing an Experimental Closed Loop: Addressing our bottleneck of low-throughput wet lab validation, he proposed highly constructive solutions: outsourcing gene synthesis to specialized companies and actively exploring the linkage of protein function with high-throughput screening methods (e.g., association with bacterial growth). This pointed out the direction for translating AI designs into experimental data on a larger scale.
- Enhancement of Project Narrative Depth: It was suggested that a clear scientific hypothesis should be formulated to explain why our fine-tuning strategy enhances the model's capability for specific tasks. This prompted deeper reflection and refinement of the project's underlying logical framework.
20. Street Interviews with Children
Format: Casual random interviews and science popularization dialogues with children and their parents
Figure 23. Street Interviews with Children
This experience prompted us to place greater emphasis on the simplicity of language and friendliness of visuals in subsequent project demonstrations and software interface design, striving to enable non-professionals to intuitively understand PROTEUS's core functions. Team members were inspired by the children's enthusiasm and deeply realized the significance of science popularization in inspiring the potential of the next generation of scientists.
Stimulating Interest with Vivid Metaphors: We used simple metaphors such as "Life's Lego" to explain to children how synthetic biology designs and constructs biological systems, sparking their strong interest and imaginative ideas.
Collecting Pure Feedback: The children's imaginative questions and intuitive understandings (e.g., "Can this AI design glowing Pokémon?") made us realize the importance of maintaining pure curiosity and imagination in technology communication.
Understanding the Starting Point of Public Perception: Through conversations with parents, we learned that the public's understanding of AI and synthetic biology mostly comes from news and film/television works, which include both expectations and concerns.
21. Visit to a Welfare House
Format: Interactive science popularization activity for children
Figure 24. Visit to a Welfare House
This activity strengthened the team's sense of social responsibility, making us firmly believe that technology developers should have a caring attitude toward vulnerable groups. It reminded us that while pursuing technological advancement, we should also consider how to make scientific and technological achievements contribute to improving human well-being—especially the quality of life of vulnerable groups—in simple, low-cost ways.
Customized Science Popularization Content: We carefully prepared 3D models of animal cells, plant cells, E. coli, and DNA. Through interaction with the children, we transformed complex scientific concepts into intuitive and interesting experiences using stories and games.
Conveying Warmth and Care: Beyond science popularization, we focused on interacting with and accompanying the children. By giving carefully prepared science-themed gifts, we brought them knowledge and joy.
Practicing Inclusive Education: We recognized that the light of science should shine on every corner of society. This activity prompted us to reflect on the inclusiveness and accessibility of technological development.
22. Social Media (WeChat/Bilibili)
Format: Official WeChat public account and Bilibili account for continuous public engagement
Figure 25. Social Media Platform
Positive interactions and feedback from netizens greatly enhanced the team's sense of accomplishment and confidence in continuing the project, making us deeply aware that our work is truly inspiring public interest in science. Misunderstandings and questions in the comment sections helped us identify blind spots in project communication, prompting us to continuously optimize the wording and demonstration materials for external introductions to make them clearer, more rigorous, and easier to understand.
Establishing a Stable Science Popularization Platform: Centering on the core theme of "AI-driven life sciences," we created and published a series of popular science content, attracting the attention of many science enthusiasts.
Obtaining Diverse Real-Time Feedback: Social media platforms became the most direct "public opinion litmus test" for our project. Questions, praise, and even criticism from netizens provided us with valuable first-hand feedback.
Achieving Mutual Inspiration and Joint Growth: Interaction with netizens is no longer one-way output but two-way inspiration. Technical suggestions and application ideas from netizens provided us with new perspectives beyond the team's thinking framework.
This ongoing public dialogue constantly reminds us of the ultimate goal of PROTEUS as an open-source project—not only to become a technical tool but also to serve as a bridge connecting science and society.
Reflect Again
Through our Integrated Human Practices journey, we have engaged in continuous reflection and improvement. Here we summarize what we have accomplished and what remains to be done for the future development of PROTEUS.
What Have We Done
Through our Integrated Human Practices, we have successfully:
- Shifted from a purely AI-centric model to a more powerful hybrid approach that considers physical and biological constraints.
- Established a practical DBTL framework—initially limited but designed for future automated expansion.
- Embedded security and ethical considerations directly into our project design, from database screening to platform governance planning.
- Gained a more comprehensive perspective on protein optimization, integrating suggestions on evaluation indicators and future directions (e.g., NLP interfaces).
- Conducted meaningful dialogues with the scientific community and the public, ensuring that our work remains relevant and responsible.
What Remains to be Done
Beyond the iGEM competition, we recognize that:
- High-Throughput Experimental Validation: Implementing the planned automated wet laboratory platform is critical for generating large-scale data needed to transform our model from promising to powerful.
- Advanced "AI + Physics" Integration: Further in-depth technical work is required to seamlessly integrate the predictive capabilities of our language model with the precision of molecular mechanics simulations.
- Development of a Natural Language Interface: To realize our vision of a truly universal and accessible platform, significant software development is needed to create the "agent" discussed with Dr. Xia Yan.
- Long-Term Security and Governance Frameworks: As the platform evolves, more formal governance models must be established—including user agreements and continuous monitoring—to minimize abuse.
- Expansion to RNA and Multi-Molecule Design: In line with Dr. Zhang Shouyue's vision, we aim to expand PROTEUS's capabilities to include RNA and RNA-protein complex design once sufficient data is available.
Our journey with Integrated Human Practices has fundamentally shaped PROTEUS. It is no longer just a software project; it is a responsibly developed tool—born from a cycle of learning, reflection, and improvement—and holds the promise of making a positive impact on the field of synthetic biology.