I n t e g r a t e d   H u m a n   P r a c t i c e

Overview

Our project encompassed a wide range of activities, from main activity to education. Throughout the project, we maintained a consistent commitment to contributing to synthetic biology.

We recognize that our project theme—sequence-based protein engineering using machine learning—significantly contributes to one of the major goals of synthetic biology: creating more useful organisms. Furthermore, to enable researchers and companies beyond machine learning specialists to utilize the models developed through our efforts, we developed software. This expands the reach of synthetic biology. Additionally, to foster interest in synthetic biology and biology more broadly, and to nurture the next generation, we invested substantial effort in education and maintained connections with society.

Under the consistent theme of contributing to synthetic biology, we consulted with appropriate experts to refine each plan while considering societal impact. Through this process, we successfully gained insights into the impact and demand our project would have on society, and by incorporating feedback into our research and development, we successfully created improved outcomes.

In the following sections, we organize our Human Practices into three categories: Project, Software, and Education. An overview of each category is provided below:

  • Project: Human Practices conducted primarily to obtain insights related to the foundation of our research activities
  • Software: Human Practices conducted to resolve challenges in software development and to seek expert opinions
  • Education: Human Practices conducted to seek advice on issues encountered during education implementation and reflections

The Summary section for each category presents an overview of the Human Practices within that category. The subsequent sections provide detailed discussions.

Project

Through our project CERES, we aimed to create a model that uses machine learning to reduce the effort required for protein engineering—a task that has sometimes been difficult to approach due to labor intensity and the need for specialized expertise—while enabling multi-objective optimization. We conducted Human Practices activitie to determine how to proceed with this project. Initially, we had no knowledge of machine learning contraring to the theme. Therefore, we received instruction on the fundamentals of machine learning and determined the framework within which to develop our model. Next, we investigated how to utilize machine learning, the appropriate protein conditions for validating this technology, and the demand for its practical application. As advancing CERES required extensive wet lab experiments that placed a significant burden on laboratory members, we introduced cell-free systems to alleviate this workload. To this end, we conducted Human Practice to acquire knowledge about cell-free systems.

Through these activities, we gained foundational knowledge and the demand for our project . So that, we were able to determine which system to employ and establish specific goals regarding the results to be achieved.

Project IHP Details

Computational Resources Required for Machine Learning

Date: 15/Dec/2023

Who: Professor M.Y.

Summary:

Through discussions with machine learning experts, we determined that pre-training protein language models requires enormous computational resources that are infeasible for our team. Consequently, we fundamentally revised our project plan to a strategy utilizing existing pre-trained models.

Elaboration:

Our project aims to construct an original protein optimization model combining generative and predictive models. Initially, we had limited knowledge of machine learning and lacked concrete understanding of the computational resources and data volumes required for model training. Therefore, we sought expert advice.

We understood that recent improvements in language model performance resulted from large-scale “pre-training” requiring computational resources on the scale of national projects. While we initially intended to perform model pre-training ourselves, discussions with experts made clear that executing this independently was financially and technically impossible.

This insight had a decisive impact on our project plan. We shifted from the vague initial approach of “building a model from scratch” to a concrete and feasible strategy: “fine-tuning publicly available pre-trained protein language models for our specific purposes.” This enabled us to concentrate our limited resources on model fine-tuning tailored to our specific objectives and constructing generation-evaluation cycles. This strategic pivot allowed us to circumvent the major barrier of computational resources and significantly enhanced project feasibility. This decision represented a critical redirection in our research and development.

Design of Multi-Objective Optimization

Date: 18/Mar/2024

Who: Dr. H.Y. (Protein Language Model Expert)

Summary:

The expert pointed out that our initially conceived “Co-evotuning” method misunderstood Evotuning, which is an unsupervised learning approach. Based on this feedback, we completely redirected our AI-based protein design approach toward a more realistic and purpose-aligned “multi-objective optimization” strategy.

Elaboration:

We had initially devised an original method called “Co-evotuning,” applying a technique called Evotuning for multi-objective optimization of proteins. However, lacking confidence in its feasibility, we sought advice from the expert in the field.

He pointed out that Evotuning is an “unsupervised learning” method that uses sequences themselves as correct answers, differing from our objective of optimization based on measured data such as enzyme activity values. He clearly saidthat our objective falls within the field called “multi-objective optimization” or “multi-task learning.” Furthermore, he presented a concrete alternative: a paper on supervised learning using ESM, a type of pLM, combined with Gaussian process regression. He also provided important guidance on a practical challenge we were considering regarding Bayesian optimization: that computational complexity would explode exponentially unless mutation introduction sites were narrowed down.

This expert advice shook the foundation of our project. We discarded the original Co-evotuning method and, based on his guidance, completely redesigned the Dry team’s research plan toward highly feasible supervised learning-based multi-objective optimization. This dialogue corrected fundamental errors in our AI design and served as the decisive turning point for placing our project on a scientifically valid trajectory.

Appropriate Use of AI Models

Date: 18/Mar/2024

Who: Dr. Y.S. (Japanese Research Institute)

Summary:

We were instructed that the appropriate use of “supervised learning” versus “unsupervised learning” in AI models is crucial for our project objectives. Based on this advice, we discarded our initial concept and decided to introduce more advanced and concrete supervised learning methods that generate sequences conditioned on specific functional values.

Elaboration:

To evaluate the feasibility of our initial project concept “Co-evotuning,” we consulted the expert in pLM research.

The expert first thoroughly explained the two basic approaches to using pLMs: “supervised” learning, which requires experimental data, and “unsupervised” learning, which uses only sequence data. He then clearly said that for our objective of simultaneously improving both enzyme activity and expression level, training data for each would be necessary. For the challenge we ultimately aim to address—“sequence generation by supervised learning”—he went beyond merely pointing out approaches and proposed multiple highly specific and specialized methods, including “conditional generative models” such as Conditional VAE and methods for fine-tuning generative models like ProGen2 with reinforcement learning.

His feedback not only corrected our misunderstanding of Evotuning but also provided clear direction for the question “What should we do next?” As a result, we completely abandoned the ambiguous idea of Co-evotuning and formally adopted the pLM-based distinction between “supervised and unsupervised” learning that he taught us into the Dry team’s project plan.

Selection of Model Protein

tsuzuki

Date: 23/Mar/2024

Who: Mr. Taku Tsuzuki (Epistra Inc.)

Summary:

Through dialogue with machine learning experts, we established specific selection criteria for proteins to be used in our project (data availability, ease of evaluation, and synthesis cost). This led to significant progress toward a feasible plan: advancing research using fluorescent protein (GFP), which meets all criteria, as a model case.

Elaboration:

To make the development of protein engineering methods feasible within iGEM’s limited resources, we sought advice from Mr. Tsuzuki of Epistra, a company specializing in machine learning, regarding which protein to select as our model protein. Through discussions with Mr. Tsuzuki, it became clear that the target protein must satisfy several critical conditions for success. Specific requirements for the model protein included: the existence of abundant public datasets for model training, the ability to express easily in systems such as E. coli and enable high-throughput functional evaluation using plate readers, and the ability to chemically synthesize genes encoding model-proposed sequences at low cost and in a short timeframe. Based on these criteria, we concluded that fluorescent protein (GFP) is optimal as a model case for this project, satisfying all conditions: abundant databases (FPBase), established expression and measurement systems, and realistic gene synthesis costs (approximately 750 bp).

Furthermore, we found that for GFP, various metrics such as brightness, fluorescence wavelength (color), and thermostability are publicly available in databases, allowing easy acquisition of multiple parameters to be optimized.

this dialogue with the expert led to a clear understanding of the specific proteins to be addressed in our project.

In Silico-Complete Optimization and Required Data Volume

nukui

Date: 22/Oct/2024

Who: Mr. Noriyuki Nukui (BioPhenolics Inc.)

Summary:

We learned that the greatest cost factor in corporate research and development is personnel expenses and the associated development period. We concluded that an “in silico-complete” approach that minimizes experimental iterations best aligns with industrial needs.

Elaboration:

Initially, we were considering two options for our protein design approach: an “active learning” type that iterates between computation and wet experiments, and an “in silico-complete” type that predicts optimal sequences in batch from a small initial dataset. To determine which approach would have more practical value in industry, we consulted Mr. Nukui, an expert in corporate research and development.

The most important insight gained from discussions with Mr. Nukui was that the greatest cost bottleneck in corporate research and development is “personnel expenses” rather than reagents or equipment costs. With costs of approximately 100,000 USD per researcher annually, approaches that repeatedly cycle through wet experiments—even if ultimately highly accurate—directly increase costs through prolonged development periods. This fact constitutes a barrier for biotech ventures and iGEMers attempting protein engineering.

Following this dialogue, we clearly determined our project’s technical strategy. To address the greatest challenge of “temporal and financial costs” faced by biotech ventures and iGEMers, we formally adopted an “in silico-complete” approach that minimizes experimental cycles. Mr. Nukui’s discussion also yielded the insight that “there is room for consideration even with a dataset of approximately 40 homologs for the protein to be improved,” leading us to change our approach to creating a model that can present optimized sequence candidates from relatively small datasets that companies can prepare, without requiring WET experimental loops. Through this strategic shift, our project objective evolved from simply designing high-performance proteins to something more concrete and practical: “constructing a design platform that dramatically reduces development time and costs under realistic industrial constraints.”

Selection of Proteins for Functional Validation

Date: 28/Apr/2025

Who: Mr. Yuki Nishigaya (AgroDesign Studios)

Summary:

We faced challenges in selecting proteins for validating the efficacy of our developed protein engineering model. We consulted Mr. Nishigaya of AgroDesign Studios, whose business focuses on structural biology and protein design, for advice on appropriate proteins for validating our engineering method. Following his advice that “finding proteins optimal for the technology is crucial,” we added two proteins to the assay sample with distinct roles: β-lactamase to simply demonstrate technical validity, and PETase to demonstrate applicability to societal challenges.

Elaboration:

In validating the efficacy of our developed protein engineering model, we faced challenges in determining which proteins to target. To address this issue, we sought advice from Mr. Nishigaya of AgroDesign Studios specialists in structural biology and protein design. Mr. Nishigaya provided extremely valuable insight: the key to success in industrial protein engineering is searching for enzymes most suitable for one’s engineering method, and companies sometimes spend months on this search process.

This advice from an expert perspective had a decisive impact on our experimental plan. Rather than vaguely improving a single protein, we selected two model proteins with different characteristics under the clear objective of simultaneously demonstrating the versatility and efficacy of our technology platform:

  1. β-lactamase as Proof of Principle: This enzyme’s activity can be evaluated simply and clearly by the presence or absence of antibiotic resistance. This avoids the challenges of validation cost and complexity in industrial applications highlighted through dialogue with Mr. Nishigaya and represents an optimal subject for purely demonstrating our model’s efficacy.

  2. PETase to demonstrate social impact and applicability: This enzyme is directly relevant to the global plastic pollution problem and has enormous social impact. Known improvement opportunities in areas such as thermostability make it an excellent target for demonstrating our technology’s applicability to practical problem-solving.

By targeting two enzymes with completely different origins and functions, we were able to establish a clear and convincing experimental plan that effectively demonstrates our technology as a versatile platform not limited to specific proteins.

Acceleration of Wet Experiments Using Cell-Free Systems

genefrontier

Date: 08/Sep/2025

Who: Mr. Takashi Ebihara (GeneFrontier Inc.)

Summary:

Given the time constraint of Wiki Freeze, we determined that protein expression using E. coli was impractical for our project, which requires evaluation of numerous sequences. Based on expert advice, we completely transitioned our experimental system to a cell-free protein synthesis system enabling rapid evaluation.

Elaboration:

Our project requires functional evaluation (assays) of as many as 48 protein candidates designed by machine learning. However, we faced a critical problem: with the Wiki Freeze deadline for iGEM’s final evaluation approaching, the conventional E. coli expression system, which requires time-consuming processes such as transformation and culture, would not allow evaluation of all sequences. To resolve this problem, we sought advice from experts at GeneFrontier Inc., developers of the cell-free protein synthesis system “PUREfrex.” Through consultation, we confirmed that cell-free systems excel in work efficiency and speed and are highly compatible with machine learning approaches like ours that handle numerous candidates. Based on this expert perspective, we fundamentally reconsidered our experimental plan and completely discarded the initial approach of using the E. coli expression system. We decided to transition the core of our project’s wet experiments to a cell-free synthesis system. This decision was an essential strategic change to complete our project goal of evaluating numerous proteins within the absolute deadline of Wiki Freeze.

Software

We aimed to publicly release LEAPS, the model obtained through CERES, as software, enabling protein engineering using machine learning methods even without specialized machine learning knowledge. In developing and publicly releasing this software, we conducted Human Practices to resolve all challenges ranging from computational resources to UI and post-release safety.

Human Practices related to software allowed us to receive insights from experts and hints for new challenges in a cascading manner. In particular, after being informed of toxic protein dangers, we interviewed medical experts and pharmaceutical companies about research content and filtering scope, and were further informed of export-related issues there, leading to subsequent Human Practices—which we consider to have been extremely valuable interviews. Additionally, by receiving different opinions on the same topic, we witnessed firsthand how beneficial scope differs greatly depending on one’s position, enabling us to better consider the scope of toxic protein filtering in implementation.

In targeting a wide range of people, being able to consult experts from various positions was extremely valuable and useful, and based on all of this, we were able to explore and implement our own answers.

Software IHP Details

Securing Required Computational Resources

Date: 12-25/Jun/2025

Who: Professor Osamu Tatebe, Professor Sangtae Kim, Associate Professor Ryuhei Harada, Associate Professor Kei Wakabayashi, Associate Professor Makoto Fujisawa, Assistant Professor Kazuto Fukuchi, Ms. Hiroko Abe

Summary:

Software implementation of the LEAPS model requires computational resources along with model improvements. Therefore, we contacted researchers at the University of Tsukuba and offered to cooperate by lending GPUs as computational resources. As a result, we concluded that GPU lending was difficult due to research ethics and security issues. To avoid these problems, we decided to use computational resources provided by companies.

Elaboration:

LEAPS, our machine learning model for protein engineering, requires enormous time during the training phase. For software implementation of this model, we needed to shorten processing time per request and enable handling of numerous requests. To solve these problems, we considered using GPUs that enable parallel processing as optimal. However, purchasing GPUs as students was difficult, and the number of GPUs we possessed was limited. Therefore, we requested GPU lending from multiple laboratories within the University of Tsukuba. However, lending laboratory GPUs poses research ethics and security problems. We concluded that to circumvent these issues, corporate computational resources with independent security arrangements and flexible use according to user purposes would be optimal. Therefore, we requested sponsorship from SAKURA internet Inc. and attempted to accelerate software processing through this support.

Investigation of Waiting Time and UI

Date: 31/Jul/2025 - 12/Aug/2025

Who: Associate Professor Yusaku Miyamae, Associate Professor Ryuhei Harada, iGEM Japan Community

Summary:

Although the computational resource issue was alleviated through sponsorship, there are limits to accelerating software equipped with full-spec models. We conducted a survey for iGEMers regarding waiting time per request and model performance. For researchers, we also investigated input/output and parameter setting formats in addition to the survey. As a result, we received responses that users could wait 5 days regardless of accuracy. Additionally, we obtained feedback that output in .fasta file format would be desirable. Based on these responses, we implemented improvements toward reduced model scale and UI that is easier to handle for research purposes.

Elaboration:

With GPU provision from SAKURA internet, computational speed increased. However, there was a challenge that waiting time would exceed 10 days to publicly release the full-spec LEAPS model as software. Long waiting times significantly affect software usability and needed to be shortened, but it was unclear how much waiting time users could tolerate. Additionally, regarding UI elements such as how many parameter settings would be ideal for actual protein research and commonly used file formats, there were many factors that students could not fully assess. Therefore, we conducted surveys on waiting time and UI with researchers involved in protein engineering and creation, who are considered the main users of LEAPS-Software, as well as iGEM Japan Community members.

Regarding waiting time, we investigated how many days users could wait when sequences with higher function than those used in training data would be output at respective ratios of 30-80%. For UI, we investigated the number of configurable parameters, their setting format, presence or absence of save functions, etc. We also surveyed desirable output formats and presence or absence of reliability scores. As a result, we received many responses from iGEM Japan Community that 5 days was acceptable regardless of accuracy, and from researchers that they could wait 7 days or more if accuracy was 50% or higher. Regarding UI, we received feedback that the number of configurable parameters should be approximately 3-5, two input formats—dropdown menus and arbitrary numerical values—should be provided, and save functions should be included. Regarding output format, responses saidthat download in .fasta format was desirable for compatibility with sequence editing software, and reliability scores should also be provided.

Based on these survey results, we decided to shorten computation time by eliminating model hyperparameter settings and modularizing the model. We also improved the UI for parameters and output to align with user needs.

Selection of Blacklist for Toxic Proteins

Date: 20/Aug/2025

Who: Professor Daisuke Kiga, Associate Professor Yohei Kurosaki

Summary:

In implementing a blacklist of proteins with misuse risk into LEAPS, we received expert advice to reference Japan’s Specified Pathogens list and the United States’ Select Agent and Toxin (SAT) list.

Elaboration:

Our developed protein modification tool LEAPS can theoretically improve the function of any protein, thus have possibilities of dual-use risks of being misused to enhance harmful proteins such as toxins and allergens. To mitigate this risk, we planned to implement a blacklist function in the software to prevent modification of specific proteins. However, the criteria for which proteins should be included in the list were unclear. Therefore, we consulted Professor Kiga, an expert in synthetic biology, and Dr. Kurosaki, an expert in biorisk management. Professor Kiga proposed using Japan’s legally regulated “Specified Pathogens” as the foundation for the initial blacklist, while Dr. Kurosaki suggested the “Select Agent and Toxin (SAT)” list established by the U.S. Centers for Disease Control and Prevention (CDC). Based on this expert advice, we began work to comprehensively investigate proteins derived from organisms included in these official lists, database them, and directly integrate them into LEAPS safety protocols.

UI Improvements and Risks of Handling Unpublished Information

宮前先生IHP.jpg

Date: 28/Aug/2025

Who: Associate Professor Yusaku Miyamae, Han Lu (D1) (Laboratory of Bioorganic Chemistry, University of Tsukuba)

Summary:

To achieve the goal of having the software we develop widely used by people conducting protein research, we needed to implement functions users require with an easy-to-use UI. Since potential users are researchers conducting protein function research, we spoke with Associate Professor Miyamae from the Laboratory of Bioorganic Chemistry at the University of Tsukuba and Han, a D1 student in the same laboratory. As a result, we learned that output formats allowing comparison between input and output sequences would be beneficial, and that chat-based formats are advantageous for ease of use. Additionally, we newly discovered resistance to inputting unpublished data from research processes and having it stored by third parties.

Elaboration:

Based on survey results and several email exchanges, the functions to be implemented in the software and target service delivery speed had become somewhat solidified. Therefore, we needed to interview people who could potentially become actual users to investigate more detailed functions and UI requirements. We spoke with Associate Professor Miyamae, who researches proteins, and Han, a doctoral student in his laboratory.

First, regarding UI, we explored how final sequences should be returned to users. We learned that having output sequences available in both .fasta and .csv formats, file formats commonly used for representing gene and amino acid sequences, would be user-friendly. As a display innovation, we received feedback that implementing a display format allowing comparison between input sequences and improved output sequences would make correspondences between sequence changes and functional changes clearer, making it easier to connect to further research and obtain meaningful information. We also received positive feedback that chat-based UI is familiar and intuitively easy to operate.

Next, we sought opinions on the misuse through enhancement of toxic protein toxicity that we had been discussing. While this risk was acknowledged, since it was outside Professor Miyamae’s area of expertise, we received feedback that hearing from experts in medical and health sciences would be necessary to obtain more detailed information. Additionally, strong resistance to inputting researchers’ unpublished data into web applications was newly revealed. While retaining sequences input into software for a certain period is necessary as a precaution against misuse, for researchers, unpublished information forming the core of their research falling into third-party hands and being retained is one of the matters requiring greatest caution. From this, we recognized the need to design clear policies regarding information security assurance and data retention, assuming we would not use input data ourselves, and display these to users. Since concerns about such data retention and risks of misuse cannot be completely eliminated technically, we received guidance that we should formulate a disclaimer to clarify our responsibilities as service providers while avoiding excessive liability.

Additionally, through the interview we were able to confirm the utility of this model. To provide a more user-friendly and trusted service, we decided to proceed with UI design enabling clearer display of output sequences and creation of disclaimers based on newly obtained information.

Date: 29/Aug/2025

Who: Integrated DNA Technologies (IDT)

Summary:

Protein engineering carries risks of improving toxic proteins and pathogen-related proteins. Therefore, we asked IDT, where similar considerations are required for genes, which proteins should be included in blacklists. We also received instruction on disclaimer content and what type of UI they use. As a result, we obtained the suggestion that external disclosure of blacklists carries risks and should be approached cautiously. Regarding disclaimers, we decided to implement a format requiring consent from multiple perspectives including toxicity and infectivity, with users’ personal signatures. As a result of this Human Practice, we learned that we need to construct the blacklist independently and conducted additional investigations for this purpose.

Elaboration:

Software that can theoretically improve any protein requires filtering through a blacklist-based system. This should include toxic proteins and proteins involved in pathogen infection. However, starting blacklist creation including protein selection as students is difficult. Additionally, while risks that cannot be addressed by blacklists alone can be handled through pledges via disclaimers, student knowledge is insufficient in many respects regarding this as well. Therefore, we contacted IDT, a DNA synthesis company, to inquire about blacklist content and disclaimers.

As a result of the interview, we learned that many companies performing gene synthesis join the International Gene Synthesis Consortium (IGSC) and implement responses based on it. We also received responses that disclosing internal information to students was difficult from a safety perspective. From this, careful judgement is required regarding whether to disclose information after blacklist creation. While iGEM often requires publication of deliverables as contribution to the community in many situations, from a safety perspective we felt there is room to consider measures such as conditional disclosure for blacklists.

Regarding disclaimers, IDT shared documents they actually use. The disclaimer consists of items such as “Does it code for any toxin?”, “Is it derived from plant or animal pathogens?”, and “Is it an infectious virus or capable of replicating within a host?”, with ordering becoming possible by agreeing to other terms as well.

Based on this information, we considered restricting blacklist disclosure, and when other teams require it, delivery under strict security through pledges is desirable. Regarding disclaimers, following gene synthesis companies, we decided to introduce UI confirming use or non-use of toxic proteins and proteins related to infectivity and pathogenicity.

Filtering of Viral Proteins

川口先生.jpg

Date: 01/Sep/2025

Who: Professor Atsushi Kawaguchi (Laboratory of Molecular Virology, University of Tsukuba)

Summary:

Technology capable of improving proteins carries dual-use risks, with virus improvement being a prime example. Therefore, we interviewed Professor Kawaguchi about what dual-use means and what level of filtering for viruses and toxin proteins involved should be. As a result, we received feedback that restrictions should be placed on improvement of viral spike proteins. Based on this information, we reconsidered the blacklist while executing further investigations regarding related laws, ethical issues, and social responsibility.

Elaboration:

Our developed software can improve any protein. Therefore, we consulted experts to select dangerous proteins from all perspectives and create a blacklist compiling them. In that process, we considered viral proteins as strong blacklist candidates, recognizing they have possibilities of dual-use lists. Therefore, we asked Professor Atsushi Kawaguchi of the Laboratory of Molecular Virology at the University of Tsukuba whether viral protein improvement should be regulated and, if so, what degree of regulation would be appropriate.

We received advice that viruses pose significant risks when they gain infectivity toward species that previously showed no infection, and therefore even within the same virus lineage, including genes of viruses infecting different species within the same dataset should be regulated. While filtering these genes by sequence homology is conceivable, this alone was considered insufficient for safety, as infection specificity can change with mutations of just a few amino acids. Since viruses mutate rapidly, even when proliferation ability is low, if they have infectivity, there is high possibility of acquiring mutations that strengthen proliferation ability in hosts. Therefore, we received feedback that prioritizing safety by initially blocking viral spike proteins and gradually introducing flexibility to those regulations would be an appropriate step. Additionally, since spike proteins may have structural similarities despite different amino acid sequences, filtering by three-dimensional structure was considered potentially more effective. We also received feedback that regardless of the flexibility of dangerous regulations, presentation of disclaimers regarding modification of proteins related to pathogenicity and infectivity is necessary.

We also asked about the basis for determining what degree of danger should be filtered. As a result, we received evaluation that performing filtering and blacklist determination based on regulations for handling those included in Class I pathogens or genes corresponding to ministerial confirmation experiments would be safe. We also received suggestions for countermeasures such as establishing criteria for toxins, including whether they correspond to large-scale culture.

Based on this advice, we reconsidered genes to include in the blacklist and decided to block viral spike proteins. Additionally, we were introduced to Associate Professor Okabayashi, an academic research officer, to gather more accurate information for considering filtering from perspectives of detailed Cartagena Protocol-related laws, ethical issues, and social responsibility.

Toxic Protein Research and Export Controls

小野薬品IHP.png

Date: 02/Sep/2025

Who: Mr. Takao Yoshida, Mr. Koh Ida (Ono Pharmaceutical Co., Ltd.)

Summary:

Research on toxic proteins and pathogens carries the risk of misuse but also holds the potential for the development of new drugs and vaccines. In developing safety measures, it is necessary to acquire knowledge not only about the risks of misuse but also about the potential benefits of development. Therefore, opinions were sought from Mr. Yoshida and Mr. Ida, Discovery & Research, Ono Pharmaceutical Co., Ltd. on the establishment of a blacklist for toxic proteins and pathogens, handling of toxic proteins, and other risks. As a result, it was said that substances with high physiological activity can become both toxic and medicinal in drug discovery research depending on the quantity and conditions of use. This makes it difficult to draw a clear line based on simple criteria. Therefore, it was understood that in some cases, it is not desirable to completely exclude toxic proteins with misuse risks from research targets from the perspective of drug discovery research. Furthermore, in addition to existing concerns, the need to consider export regulations was highlighted. Taking these into account, we decided to reexamine the criteria for constructing the blacklist, establish mechanisms that do not hinder the use of software for legitimate research, and further investigate regulatory measures.

Elaboration:

In the preceding Human Practices section, we select toxic proteins and pathogens as candidates that should be included in the blacklist and discuss their possibilities of dual-use. In order to deepen the understanding of the specific research situation in the field of drug discovery, the approach to the blacklist, the handling of toxic substances, and other risks, opinions were sought from Mr. Yoshida and Mr. Ida, Discovery & Research, Ono Pharmaceutical Co., Ltd. Please note that the following content represents the personal opinions of both individuals and does not necessarily reflect the official stance of the company.

First, it was emphasized that striking a balance between promoting proper research on toxic proteins and mitigating the risk of misuse is challenging. Poison is originally a substance with very high physiological activity that acts beyond the tolerable limit, and when delivered in the appropriate amount to the appropriate location, it can become a medicine. Regulations based on the number of toxic proteins held could be considered, but factors such as the host in which they are expressed and the experimental facility environment must also be taken into account. In addition, pathogens are not regulated at the level of individual constituent protein units, and the assessment of toxicity can vary depending on the experimental system and definitions, making it difficult to definitively evaluate the toxicity of specific proteins. On the other hand, even toxic proteins, pathogens, and proteins that constitute them, when studied appropriately, have the potential to become pharmaceuticals themselves or serve as a steppingstone for the development of pharmaceuticals and vaccines, holding the key to saving patients. In this regard, from the perspective of conducting drug discovery research, it has been pointed out that blanket regulations based on blacklists may risk excessively restricting legitimate research.

It was once again acknowledged that creating a blacklist of proteins subject to regulation is not an easy task based on such opinions and observations. However, when considering restrictions for dual-use purposes, it was recommended to refer to existing lists of Japanese Class 1 pathogens and introduce a pledge form or disclaimer screen at the time of use application to clarify the user’s responsibilities. This recommendation aligns with previous Human Practice suggestions and provides further support in that direction. Additionally, the possibility of applying export-import regulations to our research and outputs was highlighted. For instance, under export control regulations to specific countries, the export of pathogens or related systems may be restricted. Therefore, it was suggested to be mindful that when releasing this program, genetic sequence information and software functionalities may fall under such restrictions.

Based on the above advice, we will continue with reviewing the blacklist, disclaimers, and declarations, and incorporate expert opinions regarding such considerations.

Confidentiality Management for Unpublished Data

橋本先生IHP.jpg

Date: 03/Sep/2025

Who: Associate Professor Yoshiteru Hashimoto (Laboratory of Microbial Breeding Engineering, University of Tsukuba)

Summary:

In publicly releasing the software, we had discussed safety through filtering dangerous sequences by blacklists as countermeasures against dual-use possibilities in previous Human Practices. However, concerns about passing sequence information to AI and our ability to obtain unpublished improved sequence information remained unresolved. Therefore, we consulted experts on what measures would be effective for eliminating these rights-related uncertainties and decided to implement content such as confidentiality pledges.

Elaboration:

At the stage of the Web UI survey conducted in the early software development, we received guidance from Associate Professor Hashimoto that misuse through improvement of toxin proteins was conceivable. Therefore, in subsequent Human Practices with Professor Kawaguchi and pharmaceutical companies, we solidified our vision for safety measures including filtering of toxins, pathogenic viruses, and their spike proteins to eliminate biological risks. However, we had not implemented effective measures for the “concern about inputting unpublished data into software” obtained from Human Practice with Associate Professor Miyamae conducted simultaneously. Therefore, we sought advice from Associate Professor Hashimoto, who similarly has proteins as a research area, regarding measures to eliminate biological risks and responses to rights-related issues surrounding data retention.

Based on knowledge obtained from Human Practices, concerns we should consider are divided into the following three: (a) Concerns about inputting unpublished data (information leakage risk) (b) Concerns about handling of AI-improved information (intellectual property issues) (c) Concerns about software misuse (biological risk)

First, from dialogue with Associate Professor Miyamae, concern (a) became clear: researchers feel information leakage risk in inputting their unpublished data into AI we manage. Furthermore, Associate Professor Hashimoto pointed out concern (b) regarding intellectual property: that improved information generated by AI could potentially hinder future patenting. These are important issues to resolve for researchers to use the software with confidence. Regarding these, we received advice from Associate Professor Hashimoto that how much reliability we can guarantee is the challenge, concluding we should address these issues by obtaining consent to confidentiality after displaying disclaimers and terms of use regarding data retention of input information. We also received proposals that requiring consent to non-misuse and users’ personal signatures might serve as deterrents against misuse.

Based on these results, we decided to clearly state in confidentiality agreements that iGEM TSUKUBA will not use input sequence information or improved sequence information, and will not disclose data to third parties except in cases of research ethics issues. We also aim to implement a function to retain data for 30 days and thereafter delete functional values other than sequence information in an irreversible form, as this was discussed with Associate Professor Hashimoto.

Cartagena Protocol and Regulation of Toxic Proteins

岡林先生IHP.jpg

Date: 16/Sep/2025

Who: Associate Professor Koji Okabayashi (Center for Research in Isotopes and Environmental Dynamics, University of Tsukuba)

Summary:

Regarding the policy of “blacklist approach limitations” and “supplementation through disclaimers” suggested in previous IHP activities, we obtained strong support from expert Dr. Okabayashi, enabling us to form clear consensus as a team and make the final decision to implement as concrete safety design.

Elaboration:

Through dialogues with experts thus far, two important points emerged regarding safety measures our project should implement: that there are limits to usage restrictions through blacklists aggregating known dangerous sequences, and as a supplementary measure, adding disclaimers clarifying responsibility location to users is effective.

The purpose of this dialogue with Dr. Okabayashi was to verify the validity of these two countermeasures from expert perspectives on security and legal regulations. In the discussion, Dr. Okabayashi affirmed our views and pointed out that constructing technically complete blacklists is indeed extremely difficult. Furthermore, he clearly supported that systems making disclaimer consent a prerequisite for use and warning users of potential risks are more realistic and effective for demonstrating developer responsibility.

Additionally, within the scope of services provided by the software we are developing, proteins are not actually synthesized, and we cannot verify enhancement of harmful protein properties. From this perspective, even if sequences designed using our service were misused as biological weapons, researchers who synthesized the proteins would be held responsible, and the service would likely not be found illegal. Therefore, as one proposal, we received suggestions that restricting public release to countries where regulations regarding wet experiments such as the Cartagena Protocol are thorough could prevent misuse. However, in any case, it is necessary to protect the service by clearly stating developer intentions and responsibilities through disclaimers.

Dr. Okabayashi’s support from objective point of view played a decisive role in settling our internal discussions. Based on this, we decided to implement safety design through both blacklist-based safety measures and supplementation through disclaimers.

Education

Human Practices related to education were conducted to achieve two objectives: enhancing the quality of education implementation and implementing education safely. Each Human Practice was conducted in accordance with each education project, but the knowledge obtained was utilized when conducting subsequent education activities, and we consulted experts again to supplement what was lacking. For example, we asked about conditions, constraints, and matters to be observed when conducting standard genetic recombination experiments for educational purposes. This enabled us to conduct genetic recombination experiments as education. Next, we asked about measures to maintain safety when exhibiting recombinant organisms as art in rooms where unspecified large numbers of people enter and exit. This enabled diverse people to view them in exhibition format for educational purposes. Finally, by asking about handling personal sequence information, we learned that in addition to legal issues and safety, there are ethical issues to be mindful of when conducting genetic recombination experiments in education. In this way, to make genetic recombination and related technologies implementable as education projects, we accumulated knowledge within the team for conducting better education by consulting experts each time.

Education IHP Details

Effective Survey Methods

Date: 13/Feb/2024

Who: Associate Professor X (National University A)

Summary:

We had been creating surveys to reflect on each education activity, but remained uncertain whether these surveys had sufficient accuracy. Therefore, we consulted associate professor X at the University A, who specializes in science education and conducts survey research as part of his work. As a result, we learned that attention should be paid to: keeping questions concise and clear, asking only one thing per question, using words that enable more objective judgment in scaled evaluations, and considering question items to evaluate after clarifying project objectives. Based on these insights, we endeavored to create more accurate surveys going forward.

Elaboration:

We had been creating surveys each time we implemented education activities, collecting responses from students to evaluate the education. However, these were created with proper knowledge about surveys, and uncertainty remained regarding whether effective responses were being collected without problems. Therefore, we decided to consult associate professor X, who specializes in science education and sometimes conducts survey research.

Associate professor X reviewed the survey questions we had created for an upcoming education activity and provided feedback and advice based on that review.

First, we received feedback that what was being asked was unclear in the survey questions reviewed. For example, the question “Do you have knowledge of biology?” becomes an extremely broad question about biology and risks not obtaining appropriate responses. We were taught that it is necessary to narrow the scope saidby the question content to some degree according to the education content, such as “regarding genetics” or “about XX in genetics.” On the other hand, we received feedback that making questions too detailed risks providing knowledge through the question content itself and potentially leading respondents’ thinking, so it is important to create appropriate questions according to the survey’s purpose.

We also received advice to avoid double-barreled questions that contain multiple elements. For example, questions like “Do you have knowledge or experience with XX?” should be avoided, as one part might be yes while the other is no. Additionally, we were instructed that using words like “somewhat” or “very much” in response choices, which people may interpret differently, should be avoided, and instead using phrases like “if anything” creates more balanced choices.

Regarding processing survey results created based on these considerations, we were advised that the significance of statistical test results such as t-tests divides opinions even among experts, and for events implemented at the scale iGEM can conduct, it may be sufficient to merely visualize results using graphs without statistical analysis.

Finally, we received guidance that while question wording, choices, and result processing are important, the most crucial things are that the education’s purpose is clear and that the survey can determine whether that purpose is being achieved.

Based on these opinions, we decided to create surveys for subsequent education activities with conscious effort to use concise wording without content duplication and terminology whose meaning is relatively clearly understood. We also decided to make survey-based evaluation more meaningful by establishing final goals.

Date: 10/Jul/2025

Who: Assistant Professor Yoshinori Kanemori (University of Tsukuba)

Summary:

In implementing a PCR workshop for high school students, we sought expert advice on obtaining informed consent methods, as we would be handling participants’ own genetic information. Through this dialogue, we reaffirmed the importance of obtaining appropriate consent based on ethical considerations and gained knowledge regarding specific consent form templates and precautions.

Elaboration:

Initially, our team planned an experiential PCR workshop where high school students would analyze their own DNA, focusing on ALDH2 gene polymorphisms related to alcohol metabolism. We considered this an excellent opportunity to learn about genetic influences on physical constitution through a familiar theme.

However, concerns arose within the team regarding whether ethical considerations were sufficient when handling the extremely sensitive personal information of participants’ (particularly minors’) genetic information. Therefore, we sought advice from Assistant Professor Kanemori at the University of Tsukuba, who has extensive experience obtaining informed consent in similar student experiments.

In the meeting, it became clear that simply preparing consent forms is insufficient. We received the following critical guidance from the professor:

  1. Need for dual consent: The complexity that for minors, both the individual’s understanding and consent (assent) plus parental legal consent are essential.

  2. Consideration of psychological impact: Care is required for potential psychological impacts test results may have on individuals and information that may be incidentally revealed.

  3. Institutional barriers: Most critically, unlike universities, many high schools lack ethics review committees (IRBs), presenting a fundamental institutional barrier—the absence of mechanisms to objectively review and approve the ethical validity of plans.

We confronted the harsh reality that even well-intentioned educational activities cannot be implemented without appropriate ethical and institutional processes.

As a result of this dialogue, we completely scrapped our plans for the ALDH2 gene analysis workshop. To integrate this learning into our project, we explored alternatives that could completely clear ethical issues. Consequently, we changed to an entirely new workshop: extracting DNA from meat (beef, pork, chicken, etc.) available at supermarkets and identifying species by PCR. This new plan became a more refined educational program that avoids informed consent issues while conveying the fascination of PCR technology and its applications to social issues such as food fraud.

Slide 1Slide 2Slide 3Slide 4Slide 5

© 2025 - Content on this site is licensed under a Creative Commons Attribution 4.0 International license

The repository used to create this website is available at gitlab.igem.org/2025/tsukuba.