Integrated Human Practice
By talking to experts, listening to the needs of different stakeholders, and carefully assessing risks, we shaped our project into a safer, more applicable, and more impactful platform.
Overview
From concept to application, every step of our project was shaped by expert insights, safety considerations, and public engagement.
Designing functional proteins remains one of biotechnology's most powerful yet time-intensive challenges. Traditional protein drug development often exceeds 10 years and USD 2 billion in cost, with many candidates failing during late-stage trials. Our team set out to change that.
Through consultations with scientists, industry experts, and educators, we identified a critical gap: the need for a platform that can rapidly optimize proteins even with limited data while ensuring safety and applicability. These discussions inspired our integration of Reinforcement Learning (RL) and Directed Evolution (DE) into a continuous AI–experiment feedback loop.
We chose the SpyCatcher/SpyTag system as our first application due to its high potential in drug delivery, biosensing, and biomaterials, despite the scarcity of existing datasets. In the Dry Lab, our AI-driven modeling pipeline generates and evaluates candidate sequences; in the Wet Lab, directed evolution validates and improves them. The best results feed back into the model, driving iterative improvement.
Beyond technical innovation, we incorporated biosafety assessments and public engagement from the earliest stages. By embedding Human Practices into every step, we transformed a conceptual AI–biology framework into a practical, safe, and impactful protein design platform.
IHP Timeline
Background
Protein-based drug development is a long and costly process—often taking over 10 years and nearly USD 2 billion to complete. While some companies are incorporating AI and modeling into protein design workflows, most still face bottlenecks: limited datasets, slow validation, and low iteration speed nthu igem 10th.
During our research, we identified specific protein systems, such as SpyCatcher/SpyTag, that are underexplored yet possess enormous application potential. This inspired us to find a way to bridge the gap between AI-driven design and real-world applicationML + Evolution Idea Pro….

Main Takeaways
- Survey the current protein generation routes (directed evolution, ML, hybrid) and where loops truly reduce screening cost.
- Adopt an assay‑first mindset; decide readouts and controls before choosing algorithms.
- Start with a benign, well‑assayed system to produce shareable benchmarks.
- Treat assays first: define readouts, controls, and context (e.g., pH) before picking algorithms.
- A closed model ↔ experiment loop only works if we choose properties that have a reliable in-silico oracle (e.g., stability, simple binding metrics).
- Start with a benign, well-assayed system and publishable benchmarks to make results comparable and safe.
- Hybridizing directed evolution + ML/active learning helps most in low-data, high-cost settings.
Why we contacted:
To sanity-check our problem framing and make sure our "generate → test → learn" concept targets a real gap rather than duplicating existing ML-only protein design efforts.
What we learned:
Landscape validation: few groups run a truly closed loop that measurably reduces screening cost/time; benchmarks and baselines matter more than model novelty. Choose an application where assay reliability and context control are strong; avoid ambiguous endpoints.
How we integrated:
Committed to an assay-first plan and to begin with a safe, widely used system (e.g., SpyCatcher/SpyTag) and shareable benchmark conditions. Wrote success metrics around time-to-candidate, per-iteration improvement, and robustness across pH/buffer.

Main Takeaways
- Covalent protein–peptide systems are complex and lower impact than non-covalent ones
- Enabled more accessible high-throughput screening through automated active learning
- Learned why covalent peptide–protein systems are harder to simulate than non-covalent ones
Why we contacted:
Prof. Lee-Wei Yang works on applying physics-based models and molecular dynamics simulations to study protein and RNA conformational changes, with a focus on protein–peptide, protein–drug, and protein–protein interactions. His lab also uses these approaches in peptide and drug design, linking computational modeling with biochemical mechanisms. Since our project involves protein–peptide interactions and we needed guidance on the feasibility of molecular simulations, we contacted Prof. Yang to discuss our ideas, our work plan, and to ask questions on how to choose appropriate systems and approaches.
What we learned:
During our discussion, Prof. Yang emphasized that our main challenges come from fundamental chemistry and scientific reasoning, rather than the simulation software itself. He explained that covalent bonds are not simply "stronger" or "weaker," but depend on electronic structure and catalytic environment. He described our chosen system of covalent peptide–protein binding as very complex and of relatively low impact, compared to more common non-covalent binding systems. He advised that while classical MD could capture physical binding, covalent catalysis would require QM/MM, making it a very time-consuming and demanding task.
How we integrated:
● Based on our investigations into covalent protein–peptide bindings, we identified that the rate constant—rather than simply the binding energy—was the key quantity to optimize, as it reflects deeply hidden structural characteristics.
● We implemented a graph neural network (GNN) within an autoencoder, enabling the model to learn and represent the subtle structural determinants of reactivity. To simulate a realistic user scenario, imagine a researcher or bioengineer seeking to rapidly design a peptide with optimized covalent binding properties. Traditionally, this would involve time-consuming wet-lab screening or specialized computational expertise, making the process slow and inaccessible to non-experts. With our pipeline, however, the workflow becomes intuitive and automated: users can simply input candidate structures and leverage our active learning loop to iteratively refine predictions, avoiding large-scale sample evaluations while still converging toward functionally optimal proteins.
● This makes high-throughput screening accessible even to non-specialists, empowering broader communities to generate proteins with desired functional properties more efficiently. Our pipeline reshapes protein engineering by reducing the experimental burden and opening up possibilities for faster, more inclusive discovery cycles.
Our Idea
We designed a continuous AI–experiment feedback loop that integrates:
- Active Deep Learning (ADL) for adaptive decision-making in sequence generation
- Directed Evolution (DE) for real-world functional optimization
Our model explores sequence–structure–function space using generative AI, evaluates candidates with simulations and experimental assays, and integrates the best results back into the AI model. This cycle accelerates optimization for multiple objectives—from binding affinity to catalytic efficiency—while remaining effective even in small-data scenariosML + Evolution Idea Pro…nthu igem 10th.

Main Takeaways
- Center the loop on time‑to‑candidate and improvement per iteration, not on absolute model scores.
- Expect skepticism; design transparent baselines and ablations.
- The loop is convincing only if it shortens time-to-candidate under realistic lab throughput and budget.
- Expect skepticism: design transparent baselines and ablations so improvements are attributable to the loop, not scale.
- Tie objectives to application-anchored metrics (e.g., binding yield vs. pH; operational stability), not abstract sequence scores.
Why we contacted:
To pressure-test our closed-loop concept and lock clear loop KPIs and decision gates before experiments.
What we learned:
When natural sequences are abundant, direct modeling may suffice; when data are scarce, use a generate-then-learn path (e.g., proposal generation → train/retune models → active selection). Clearly state when we would skip or include generative steps and why (data regime decision branches).
How we integrated:
Fixed three loop KPIs: 1. Candidate quality at fixed screen size, 2. Per-iteration improvement, 3. Context robustness (pH/buffer). Re-scoped the roadmap into short, publishable cycles with pre-declared baselines/ablations and explicit data-regime decision rules.

Main Takeaways
- Clarify problem framing: what failure modes are we reducing (cost, time, robustness)?
- Publish negative results and baseline curves to build trust.
- Close the loop: feed experimental data back into the front of the pipeline; if resources are tight, use experiments primarily for validation.
- Prefer AI-guided candidates over large, purely random mutagenesis; keep random mutation minimal.
- Run parallel selections under pH 5/7/9, keep Top-k per line, and define explicit stopping rules / retraining cadence.
- Clearly describe each step's purpose, principle, inputs, and outputs; publish the workflow/data schema.
Why we contacted:
Ensure our idea solves a real research bottleneck. We asked Prof. Li to pressure-test our problem framing and loop design—especially whether our compute-first → validate-with-experiments approach is credible, what to prioritize in low-data regimes, and how to present the pipeline and data to external evaluators.
What we learned:
Plan up‑front for clear success/failure criteria and documentation. A dual-track strategy is appropriate: computational loop (RFdiffusion → ProteinMPNN → MD-labeled dataset → ML iteration) plus an experimental track (error-prone PCR library; split-fluorescence kinetics) with pH (5/7/9) parallelization. Experimental results should return to the front of the pipeline; but when bandwidth is limited, they can serve as final validation for compute-screened candidates. Replace large random libraries with guided shortlists; design selection to observe condition-specific improvements (per-pH lines). Measure binding rates via fluorescence kinetics rather than relying on single end-point intensity.
How we integrated:
Added explicit decision gates per cycle and a public benchmark plan. We locked a compute-first, validate-later plan for October deliverables, while keeping a path for small validation sets to calibrate the model. Implemented pH-parallel selection (5/7/9) with Top-k retention and explicit stop/retrain rules in each line. Replaced large random error-prone PCR rounds with AI-nominated candidates for testing; random diversification is a fallback only. Documented a data schema (sequence, structure_id, pH/conditions, kinetic rate, model score, iteration) and wrote step-by-step method blurbs for the wiki.

Main Takeaways
- Tie objectives to biochemically interpretable metrics (e.g., conjugation efficiency under different pH).
- Keep assays simple and reproducible across labs.
- Make the model limits explicit. The ML-generated design space is constrained by its training set; show a single concept figure that links design space × structural/layout choices × performance so reviewers see how the pipeline leads to impact.
- Close the loop once, with guardrails. Prioritize running WL→screen→DL end-to-end at least once, and define a stop-loss gate for time/resources.
- Compute-first is fine—tell the story clearly. Accept the training-set boundary for low-data targets and explain it up front in your Paris presentation.
Why we contacted:
Ground our goals in robust, standardized biochemistry. We sought Prof. Wu's practical view on MPNN/training-set constraints, how to frame a compute-first WL×DL loop, and how much wet-lab validation is enough for a credible pipeline narrative.
What we learned:
Prioritize assays that separate binding from stability; include context variables.
- Design space vs. training set: MPNN outputs remain bounded by the original training set; expanding it needs substantial new data—so for low-data targets, acknowledge and design within that boundary.
- Presentation: Include one crisp concept figure mapping layout/structure, ML limits, and performance to make the pipeline-to-impact link explicit.
How we integrated:
We committed to two headline metrics and standardized buffers for replication.
- Added a design space × layout × performance visual to slides/wiki and narrate ML limits explicitly.
- Committed to one closed-loop pass (WL→screen→DL) with stop-loss thresholds so gains are attributable to the loop, not scale.
Application
Our initial target is the SpyCatcher/SpyTag system—a lock-and-key protein pair forming irreversible covalent bonds under mild conditions. It is highly valuable for:
- Stable protein conjugation
- Modular assembly of functional biomolecules
- Applications in biomaterials, drug delivery, and biosensing
However, current variants face limitations in pH stability and binding performance nthu ioogem 10th. By applying our iterative AI–DE pipeline, we aim to create new SpyCatcher/SpyTag versions optimized for specific industrial and research needs.

Main Takeaways
- SpyCatcher/SpyTag shows clear user demand but data gaps and generation‑efficiency variance slow adoption.
- If improved, immediate benefits appear in conjugation workflows (diagnostics, immobilization, biomanufacturing).
- Feasibility hinges on transporter biology. Real-world success depends on the availability and variability of LAT1 (and similar transporters) across tumor types; indications with consistently high expression are the most promising starts.
- Dose is the biggest unknown. Even with strong in-vitro activity, in-vivo dose sufficiency (delivery to target tissue at therapeutic levels) remains the key translational barrier.
- Platform value ≠ immediate clinic. Near-term wins are methodology, datasets, and benchmarks that de-risk future translation—even if clinical use is still distant.
Why we contacted:
Validate that our first use‑case solves a real, valuable bottleneck. To stress-test our first application framing with a translational lens: which disease contexts are realistic, what evidence (assays, readouts) carries weight for potential users/regulators, and how to phrase claims responsibly given dose and transporter uncertainties.
What we learned:
Define user‑relevant KPIs (binding yield vs. pH; operational stability) and side‑by‑side baselines. Indication selection must be expression-led. Start with cancers where LAT1 expression is high and well-characterized; plan for patient/line stratification rather than "one-size-fits-all" claims. Design uptake evidence early. Build an in-vitro transporter-mediated uptake panel (e.g., LAT1-high vs. LAT1-low cell lines, competition assays with known substrates/inhibitors) to show mechanism and estimate practical dose windows. Own the translational gap. Clearly state that current results are preclinical/method-focused; dose sufficiency and in-vivo delivery remain open questions that require future PK/PD work. Long-term value is real. Even without immediate clinical deployment, the work advances methods and expertise and can seed later collaborations.
How we integrated:
We locked SpyCatcher/SpyTag as v1 and drafted an evidence plan for benchmarking and sharing. Application criteria added: "LAT1-high first" rule for use-case selection; every dataset reports transporter expression and analyzes outcomes by expression tier (high/medium/low). Evidence plan defined: Build a cell-line panel spanning LAT1 expression; include competition/knockdown controls to verify transporter contribution. Report uptake kinetics and an estimated payload-to-target ratio to connect in-vitro effect to plausible dosing. Claims narrowed: Present the near-term deliverable as a platform benchmark + SOPs (assays, analysis scripts), explicitly not a ready-to-treat therapy. Roadmap published: Outline future steps (PK/PD modeling, in-vivo feasibility studies with partners) and the criteria to graduate from method development to translational evaluation.
Modeling
In the Dry Lab, we combined ProteinMPNN and RF Diffusion to explore protein sequence space. Each candidate is evaluated via:
- AlphaFold 3 structure prediction
- Molecular Dynamics (MD) simulations for stability and binding analysis
- Custom computational scoring for the target property
The sequence and structural data are encoded into a multi-modal autoencoder, enabling latent space optimization (e.g., Bayesian Optimization) to discover high-performance candidates. The best sequences are experimentally validated and fed back into the model for the next iterationML + Evolution Idea Pro….

Main Takeaways
- Condition models on environmental variables (pH) and couple objectives (binding + stability).
- Favor Bayesian/active learning for small batches.
- Oracle quality is central. Prefer properties with reliable computational evaluation (e.g., stability, simple binding proxies) and use them to drive Bayesian/active learning.
- Conditioning on environmental variables (pH, buffer) is required—either in the model or by multitask readouts.
- Uncertainty-aware selection and strict cross-validation beat chasing SOTA benchmarks.
Why we contacted:
To design the handshake between the generator, the oracle, and the active learner, and to set a cadence that the wet lab can actually sustain.
What we learned:
Pick endpoints that MD/structure-based or other lightweight predictors can score consistently; then batch size and cadence of active learning should match plate cycles. Keep proposals feasible for libraries (schema constraints) and track single vs. combinatorial mutations to observe epistasis.
How we integrated:
Implemented an uncertainty-aware active-learning loop synchronized to experimental plate runs (per-plate model updates). Defined a two-objective setup (binding + stability) with pH conditioning; set mutation budgets and library schemas per iteration so modeling output is directly executable in the lab.

Main Takeaways
- Emphasized masking strategies to prevent posterior collapse during training.
- Highlighted the need for a compact latent space to ensure interpretability.
- Confirmed that our auto encoder (AE) architecture is well-suited for this optimization task.
- Advised that the major computational cost lies in the simulation stage, not the AI model.
Why we contacted:
Our proposal aimed to combine AI-driven generative modeling with directed evolution to optimize protein sequences. To refine the modeling architecture and training stability, we consulted Prof. Dr. Chi-Chun Lee, whose expertise lies in computational modeling and latent variable learning.
What we learned:
Our proposal aimed to combine AI-driven generative modeling with directed evolution to optimize protein sequences. To refine the modeling architecture and training stability, we consulted Prof. Dr. Chi-Chun Lee, whose expertise lies in computational modeling and latent variable learning.
How we integrated:
Following Prof. Lee's guidance, we incorporated stochastic masking and KL annealing in the model's training pipeline. We also constrained latent dimensionality to focus learning on meaningful biophysical features. As he highlighted that the main computational load resides in physics-based simulations rather than neural network training, we prioritized optimizing simulation throughput on the HPC cluster instead of expanding AI model complexity.
Evolution
In the Wet Lab, we apply Directed Evolution to mimic natural selection:
- Generate diverse protein variants
- Experimentally screen for improved performance
- Select the best performers and feed them back into the AI pipeline
This mutual reinforcement between in silico modeling and in vitro evolution shortens optimization cycles and improves final protein qualitynthu igem 10th.

Main Takeaways
- Separate assays for binding vs. stability to avoid confounding.
- Design plates to measure pH‑response curves explicitly.
- Error-prone ≠ "more is better." Performance does not increase monotonically with mutation count; site/position matters more than sheer numbers. A single round + rigorous screening can deliver a valid PoC.
- Tune conditions, not just cycles. Adjust AT/CG bias and ionic strength to modulate mutation rates instead of stacking many rounds.
- Practical operating window. For on/off assays in *E. coli*, 2–5 h is sufficient; check viability effects of random mutation, verify orthogonality (similar survival in on/off), and sequence-verify selected variants.
Why we contacted:
Align library design with biochemical readouts. We needed concrete guidance on error-prone PCR rounds vs. screening strategy and on/off timing/controls to keep the WL×DL evolution plan feasible and resource-balanced.
What we learned:
Include replicates and environmental gradients. Don't chase mutation count; run one cycle → screen, then scale only if needed. Prefer parameter tuning (AT/CG bias, ionic conditions) over additional cycles; formal formulas are messy—treat empirical coefficients pragmatically. 2–5 h switching is adequate; include viability/orthogonality checks.
How we integrated:
Adopted gradient plates and reference variants for calibration. Set our PoC plan to one error-prone cycle + stringent screening, with AT/CG bias/ionic tuning to enrich beneficial mutations. Locked an end-to-end loop (WL→screen→DL) once, with stop-loss thresholds to protect time/budget. Fixed assay conditions to a 2–5 h window and added viability/orthogonality controls plus mandatory post-selection sequence verification.

Main Takeaways
- Automate screening and selection. The process should be further automated to minimize manual intervention.
- Maximize reproducibility. Automated workflows enable higher reproducibility across independent trials.
- Reduce human error by implementing standardized protocols and automated data collection.
Why we contacted:
To evaluate our experimental workflow and identify opportunities for process optimization and standardization.
What we learned:
Manual screening introduces variability; automating plate reading, colony picking, and data logging improves consistency. Standardized protocols with minimal human touch-points lead to more reliable results.
How we integrated:
Implemented automated plate reader protocols, developed standardized data collection scripts, and established quality control checkpoints throughout the screening process to ensure reproducibility.
Safety
We have assessed potential biosafety and biosecurity risks associated with genetic modification and protein applications, including off-target effects and pH-dependent performance of SpyCatcher/SpyTag nthu igem 10th. Risk management strategies have been developed to ensure compliance with safety regulations. Furthermore, we evaluate application controllability and environmental impact to guarantee the long-term safety of our designs.

Main Takeaways
- Implement sequence hazard screening (toxins/virulence) and do‑not‑design lists.
- Keep all work within BSL‑1/2; in‑vitro only; no environmental release.
- Compliance & transparency first. Align with international guidance (e.g., ICH X6), and make the pipeline's algorithms and decisions interpretable and auditable.
- Immunotoxicity is the primary risk. Macromolecules are readily recognized by the immune system; safety-by-design should precede function optimization.
- Use multi-model, multi-objective safeguards. Combine multi-task learning (add a toxicity head) with post-generation toxicity filters; relying on a single generator risks producing toxic sequences.
- Mind the data you train on. If toxic compounds appear in the training set, models can learn undesired features; toxicity labels are needed to supervise filters.
- Small bio datasets calibrate, not refine. Wet-lab data help with calibration, but are often too small for full model refinement.
- Context matters. Engineering SpyCatcher variants functional at low pH suggests opportunities in tumor microenvironments—claims must be tied to the tested pH/conditions.
- Platform-level safety. Build safety modules (immunotoxicity risk assessment) into the design platform, not as an afterthought.
Why we contacted:
Audit risks from deliberate sequence changes. To audit our pipeline from a regulatory and immunotoxicity perspective—i.e., what must be in place (transparency, labeling, toxicity modeling) for responsible claims, and how to embed safety into our AI-assisted sequence design rather than bolting it on later.
What we learned:
Build human‑in‑the‑loop review before ordering DNA. Safety-by-design should lead: publish a clear, auditable algorithmic trail and state application boundaries (e.g., pH ranges). Implement dual safeguards: 1. Multi-task toxicity head in training, and 2. Independent post-generation toxicity filter before synthesis. Curation is critical: avoid seeding toxicity patterns from training data; plan how to label toxicity for supervision. Data strategy: use limited bio data to calibrate thresholds, not to fully re-train generators; keep structure inputs only if they help the target property you optimize.
How we integrated:
Automated filters + manual review; documented approvals per batch. Embedded safety gates in the loop. Upstream: added a toxicity prediction head and "do-not-design" filters tied to immunogenic motifs. Downstream: apply an independent toxicity screen to all generated sequences before ordering DNA. Training-data hygiene + labeling plan. Established dataset checks for toxicity patterns and drafted a labeling scheme to supervise toxicity models. Calibrate, then claim. Moved wet-lab data to calibration of thresholds (not full refinement) and bound all public claims to tested contexts (e.g., low-pH use cases like tumor microenvironments). Platform transparency. Logged model versions, inputs, and decision rationales to meet auditable, ICH-aligned documentation expectations.

Main Takeaways
- Consider how pH and buffer conditions affect binding claims and safety statements.
- Transparency: publish methods and boundaries of safe use.
- Guard against metric bias: fit kinetic models (k_on/k_off) from time series instead of using raw fluorescence intensity.
- Always report context (pH, buffer) and design condition-specific selections to avoid over-generalizing claims.
- Bridge simulation–experiment gaps with small validation sets before making public claims.
- Define transparent stopping criteria and retraining cadence to prevent cherry-picking.
Why we contacted:
To ensure that our evaluation methods and public statements are accurate, reproducible, and conditioned on the actual assay environment (e.g., pH), and to structure our iteration rules to avoid overclaiming.
What we learned:
Use fluorescence kinetics as the primary safety/validity check for binding rate; document instrument settings, time resolution, and fitting procedure (SOP). Make pH lines explicit (5/7/9) and evaluate per-line improvements rather than pooling across conditions. Keep a written risk/mitigation note: simulation–experiment calibration, metric choice, and prudent use of AI-guided vs. random mutation.
How we integrated:
Adopted a Kinetics-first SOP (time-series fitting) and added a mandatory context block (pH, buffer, temperature) to every result table. Wrote selection/stop rules (per-pH Top-k retention, stopping thresholds) and a small validation set to calibrate predictions before broader claims. Trimmed wet-lab scope to AI-guided candidates with random mutation as a limited fallback; recorded decision logic in the wiki for transparency.
Education
We plan to promote public understanding of protein design and AI applications in biotechnology through outreach activities, educational camps, and social media engagement. By collaborating with educational institutions, museums, and community programs, we will design interactive content and teaching materials to make scientific knowledge more accessible to audiences of all ages nthu igem 10th.

Main Takeaways
- Play-based learning using picture books, drawing, and simple storytelling works best for early learners.
- Keep sessions short, visual, and curiosity-driven; avoid complex terms or excessive detail.
Why we contacted:
We sought to co-design outreach activities suitable for 5–8-year-olds, ensuring that our synthetic biology themes could be communicated through stories, visuals, and tactile experiences.
What we learned:
Children learn most effectively through repetition, movement, and narrative play. Teachers suggested using short story arcs, hands-on drawing tasks, and feedback stickers or smiles to gauge engagement and comprehension.
How we integrated:
We refined our picture book and board game to include clear visuals, shorter interaction cycles, and blank creative spaces that allow children to draw or write their own ideas—transforming learning into participation.

Main Takeaways
- Focus on real-world relevance (e.g., diagnostics, safety checks, clinical workflows) when engaging families.
- Keep ethics and safety explanations simple, factual, and reassuring—highlighting no human/animal testing and strong biosafety protocols.
Why we contacted:
We collaborated with medical staff to ensure our community outreach and education materials aligned with healthcare communication standards, particularly for parents and caregivers.
What we learned:
Families respond best when messages are practical and transparent. Healthcare professionals advised providing handouts and Q&A sheets that directly address common concerns about biotechnology safety and application boundaries.
How we integrated:
We developed bilingual Q&A sheets explaining "what we do and don't do" in our project, clarified biosafety measures, and added a real-world context to our educational exhibits—helping families understand how synthetic biology connects to healthcare innovation safely and ethically.
Human Practices Maturity Model

1. Reflecting on Design Decisions
HighAssay-first plan; KPIs set (time-to-candidate, per-iteration gain, pH robustness); pre-declared baselines/ablations; data schema + stop/retrain gates.
Reasoning: Stakeholder input directly shaped design choices and documentation; decisions trace to measurable KPIs and SOPs.
2. Exploring & Reflecting on Context Beyond the Lab
Mid–HighConsidered translational limits (dose, transporter variability), regulatory/safety alignment, operational throughput; public education plan.
Reasoning: Strong context mapping; next step is explicit "context → design pivot" memos affecting scope/timelines.
3. Incorporating Diverse Perspectives
HighAdvisors across ML, biochemistry, evolution, safety, clinic, and industry; feedback led to SpyCatcher/SpyTag focus, loop KPIs, active-learning cadence.
Reasoning: Diverse—and critical—voices were integrated with visible project changes.
4. Anticipating Positive & Negative Impacts
Mid–HighSafety-by-design platform (toxicity head + independent filter, do-not-design lists); BSL-1/2, in-vitro only; pH-bounded claims; BNCT T/N imaging gate.
Reasoning: Robust safeguards; to reach High, co-develop mitigations with external biosafety/regulatory partners and document adopted changes.
5. Responding to Human Practices Work
HighHP insights define assays (binding vs. stability), library schemas, mutation budgets, and cadence; education framed as duty-of-care.
Reasoning: HP continuously informs technical and communication decisions, not an add-on.
6. Approaching Limitations with Integrity
HighModeling/training-set limits and QM/MM burdens; fixed screen caps; stop-loss thresholds; plan to publish negatives/benchmarks.
Reasoning: Transparent about uncertainties and pre-committed adaptation rules.
7. Creativity & Originality
HighClosed AI↔DE loop with pH-conditioned objectives and platform-level safety applied to an underexplored but practical SpyCatcher/SpyTag system.
Reasoning: Novel integration oriented to shareable benchmarks rather than one-off results.
Summary
Axis | Level |
---|---|
Reflecting on Design Decisions | High |
Exploring & Reflecting on Context Beyond the Lab | Mid–High |
Incorporating Diverse Perspectives | High |
Anticipating Positive & Negative Impacts | Mid–High |
Responding to Human Practices Work | High |
Approaching Limitations with Integrity | High |
Creativity & Originality | High |
Table 1. Summary of Human Practices Maturity Model self-evaluation.
The table summarizes our maturity levels across seven axes. We reached High in five axes (Reflecting on Design Decisions, Incorporating Diverse Perspectives, Responding to Human Practices Work, Approaching Limitations with Integrity, and Creativity & Originality), with two axes at Mid–High (Exploring & Reflecting on Context Beyond the Lab and Anticipating Positive & Negative Impacts), reflecting strong stakeholder integration, measurable KPIs, and transparent documentation with ongoing development of context-to-design pivot mechanisms and external regulatory partnerships.

Figure 1. Human Practices Maturity Model self-evaluation for NTHU iGEM 10th.
The radar plot shows our self-assessment across seven axes, achieving High maturity in five areas: Reflecting on Design Decisions, Incorporating Diverse Perspectives, Responding to Human Practices Work, Approaching Limitations with Integrity, and Creativity & Originality. Two axes remain at Mid–High (Exploring & Reflecting on Context Beyond the Lab and Anticipating Positive & Negative Impacts), indicating areas for continued development through external partnerships and formalized design pivot processes.
Reflection
Our Human Practices Maturity Model shows strong performance in several key areas, including Exploring Context Beyond the Lab, Responding to Human Practices Work, and Creativity & Originality. From the outset, we grounded our project in industry needs, identified critical bottlenecks in protein design, and selected a high-impact application (SpyCatcher/SpyTag) to demonstrate the potential of our AI–Directed Evolution pipeline. Stakeholder engagement has already shaped our technical design, from data strategy to wet-lab feasibility, and our unique integration of reinforcement learning with iterative experimental feedback represents a novel approach within iGEM.
We also demonstrate maturity in Reflecting on Design Decisions, Incorporating Diverse Perspectives, and Anticipating Positive & Negative Impacts. We actively considered biosafety, environmental conditions, and ethical implications, embedding safeguards into our design. However, while we have engaged with a range of academic and industry stakeholders, we recognize that iterative, two-way dialogue—especially with regulatory authorities and critical voices—can be strengthened.
The axis with the most room for growth is Approaching Limitations with Integrity. Although we are transparent about small-data constraints, environmental dependencies, and potential model generalizability issues, we have not yet established a formal review mechanism to revisit limitations and adapt our strategy throughout the project lifecycle.
Future Plan
To advance our maturity across all axes, we plan to:
Deepen Stakeholder Iteration
Establish recurring check-ins with domain experts, regulatory advisors, and potential end-users to ensure our design choices remain relevant, feasible, and compliant. Actively seek input from critical or alternative perspectives to challenge and strengthen our assumptions.Enhance Contextual Integration
Expand our research on market readiness, potential industrial partnerships, and policy landscapes in multiple regions to prepare for potential real-world deployment. Investigate intellectual property (IP) strategies to support future commercialization pathways.Strengthen Impact Anticipation
Conduct structured "misuse scenario" workshops to identify and mitigate unintended applications. Collaborate with biosafety experts to co-develop response plans for hypothetical failure modes.Formalize Limitation Review
Implement a periodic SWOT review cycle to reassess strengths, weaknesses, opportunities, and threats at defined milestones. Document adaptation decisions and communicate them transparently in both technical and Human Practices reports.Expand Public Engagement
Develop targeted educational content for different audience segments, from high school students to policymakers, to broaden societal understanding of AI-assisted protein design. Partner with science museums and outreach programs to create interactive demonstrations of our platform.
By focusing on these areas, we aim to progress towards the highest maturity level across all axes, ensuring our project is not only technically innovative but also socially responsible, ethically sound, and ready for real-world translation.