To realize our project, we conducted two types of modeling: Viral Transmission and Protein Optimization. The former is a mathematical model of the infection dynamics of avian influenza virus, designed to set key strategic targets by identifying the indicators we should focus on for virus suppression. The latter was devised to assist in the sequence design of our device, COCCO, by introducing new mutations to ensure that the novel fusion protein maintains its expected folding and undergoes the desired structural change upon ligand administration.

Both modeling efforts were not only indispensable steps for the progress of our project, but their content also includes several parts that could be generally useful for the project progression of other iGEM teams. In addition to specifically explaining how this modeling benefited our own project, we will also focus on and highlight several modules that we believe will be useful to other teams.

In conducting our modeling, we paid special attention to the following points:

That the analysis of the modeling would help us understand the conditions required for COCCO and support our decision-making in the process of devising COCCO's design.
To sufficiently incorporate literature research on existing models and the insights gained from Human Practice.
To integrate evaluation by the Wet Lab as much as possible.
To give special priority to validation through modeling for aspects that cannot be demonstrated by proof-of-concept in the Wet Lab, such as the effect of suppressing virus spread.

As a result, several methods were created that future iGEM teams can reference. To share these with other teams, this document provides procedures in as much detail as possible. This overview will outline each of them.

Viral Transmission

Fig1. Transition diagram of viral transmission and epidemiological states

We constructed a model by assuming the dynamics of the virus particles and epidemiological states of the cells as described above. The parameters p and δ represent their dependence on the time elapsed since infection (the infection age a).

To understand the dynamics of intercellular virus infection in chicken cells with the introduction of COCCO, we constructed and analyzed its mathematical model. We built a model using a system of differential equations, considering COCCO's characteristic of inducing cell death more rapidly than viral proliferation. By mathematically analyzing this model, we found that the system exhibits threshold-like behavior. Furthermore, we ran simulations using data collected from wet lab experiments or scientific papers and showed that COCCO attenuates the spread of infection. We also confirmed that inducing cell death has an advantage over other methods in halting the spread of infection.

This modeling is likely a method that future iGEM teams should reference when utilizing modeling in their projects for the following reasons:

By building a minimal model to determine the potential success of a project, the objective can be achieved without losing reliability or interpretability.
By constructing a model that is not overly complex and analyzing it mathematically, the essential properties of the system can be revealed.
Analyzing the threshold-like behavior of a system can provide a basis for judging whether the system will function effectively.
By limiting the parameters estimated at one time and estimating them in stages using multiple experimental datasets, a highly reliable parameter estimation can be achieved.

For example, when handling a model of the glycolytic pathway and considering the rate regulation by phosphofructokinase (PFK), the following serves as an example of how the above points can be referenced:

For the connection between the glycolytic system and the external environment, consider only the inflow and outflow of glucose, pyruvate, and other energy carriers at very simple, constant rates. Additionally, for reactions not controlled by PFK, describe the reaction rates with simple equations, such as a proportional formula or the Michaelis-Menten equation.
Since the reaction exhibits a threshold-like property due to PFK's regulation, by fitting an appropriate functional form to the reaction rate, the threshold that controls the reaction's progression can be identified.

Protein Optimization

Fig2. Schematic diagram of EVOLVE.

EVOLVE, an amino acid sequence optimization workflow that enhances binding ability, consists of five steps.

To realize COCCO, we designed a fusion protein of RIG-I and APAF1 (see Design). However, if the wild-type domains are simply linked, the binding affinity between RIG-I and the CARD of APAF1 is lost, making it highly likely that COCCO would not possess the complex functions required by the project. Therefore, in silico protein simulations, we attempted to design a protein with two functions by introducing appropriate amino acid mutations to improve binding affinity: (1) detecting dsRNA and (2) switching a signal ON/OFF using the CARD of different origins. For this purpose, we constructed EVOLVE, a workflow composed of multiple software programs. Furthermore, we conducted experiments in the Wet Lab using the protein designed in silico and verified the binding function of the mutated fusion protein was improved.

Our EVOLVE is useful for other iGEM teams to design and evaluate proteins in silico to improve the binding affinity of proteins that interact with other substances. For this purpose, EVOLVE is designed to be user-friendly in the following ways:

Name	Definition	Unit	Discription
$t$	Time	h	Time for the entire system.
$a$	Infection age	h	Time that has elapsed since a virus particle entered a target cell
$T(t)$	Number of target cells	cells / mL	A target cell is a cell that is completely uninfected by a virus.
$i(t,a)$	Number of infected cells per unit age of infection	cells / (mL・h)	An infected cell is a cell that has been invaded by a virus.
$V(t)$	Number of virus particles	HA / mL	Infectious virus particles outside of the cells are counted.

Name	Definition	Unit	Discription
λ	Production rate of target cells	cells / (mL・h)	Target cells are produced at a constant rate λ.
d	Death rate of target cells	1 / h	Target cells die at a constant rate d.
β	Infection rate	mL / (HA ・h)	Target cells are infected by the virus at a constant rate β per virus particle.
δ(a)	Death rate of infected cells	1 / h	Infected cells die at a rate δ(a) that depends on the infection age a.
p(a)	Virus production rate	HA / cells	Infected cells produce virus particles at a rate p(a) that depends on the infection age a
$c$	Virus clearance rate	1 / h	Virus particles are cleared at a constant rate c
$T_{0}$	Initial number of target cells	cells / mL	Number of target cells at t=0
$i_{0}(a)$	Initial number of infected cells per unit infection age	cells / (mL・h)	Distribution of infected cells with respect to infection age a at t=0
$V_{0}$	Initial number of virus particles	HA / mL	Number of infectious extracellular virus particles at t=0

Name	Definition	Unit	Value
λ	Production rate of target cells	cells / (mL・h)	$5.64\times 10^{4}$
d	Death rate of target cells	1 / h	$5.97\times 10^{-3}$
β	Infection rate	mL / (HA ・h)	$6.61\times 10^{-3}$
δ(a)	Death rate of infected cells	1 / h	$6.18\times 10^{-2}$
p(a)	Virus production rate	HA / cells	\begin{align} p_{\max}&=2.93\times 10^{2} \\ a_{1}&=5.00 \\ b_{1}&=1.10\times 10^{-3} \end{align}
$c$	Virus Clearance rate	1 / h	$5.73\times 10^{3}$

No.	Affinity to the CARD of APAF1	Affinity to the dsRNA	Sequences
WT	32.59	53.01036	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACMVFQMPDKDEESRICKALFLYTSHLRKYNDALIISEHARMKDALDYLKDFFSNVRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN
95	46.87	53.61502	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACRVFQMPDEEEEKRILKALELYTSHLRKYNDALIISEHARMKDALDYLKDVFSNPRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN
68	43.82	56.22829	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACMVFQMPDKSEESRIRKALFLYTSHLRKYNDALIISEHARMKDALDYLKKFFSNVRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN
178	43.38	53.10431	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACNVFQMPDDDEESRIVKALRHYTSHLRKYNDALIISEHARMKDALDYLKTEFSNIRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN
162	37.71	54.20295	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACMVFQMPDKDEESRICLALFLYTSHLRKYNDALIISEHARMKDALDYLKDQFSNVRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN
141	37.36	54.48599	FKYIIAQLMRDTESLAKRICKDLENLSQIQNREFGTQKYEQWIVTVQKACEVFQMPDKIEEERISKALHFYTSHLRKYNDALIISEHARMKDALDYLKSLFSNVRAAGFDEIEQDLTQRFEEKLQELESVSRDPSNEN

No.	Affinity to the Hel2i	Sequences
WT	32.59	MDAKARNCLLQHREALEKDIKTSYIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS
134	51.90	MDAAARNTLLLHRELNLDDIIVESIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS
84	51.52	MDELARNALLFHREHQIPDIAWSSIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS
46	47.69	MDKIARNILLMHRELYRTDIEVLYIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS
108	46.10	MDEKARNWLLAHREDIDEDIIAESIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS
0	44.54	MDQLARNVLLTHREININDIQVEGIMDHMISDGFLTISEEEKVRNEPTQQQRAAMLIKMILKKDNDSYVSFYNALLHEGYKDLAALLHDGIPVVSSS

Step	Specific Criteria Setting	Remaining Proteins
Mutable Site Selection		$20^{138}$→$20^{12}$
Mutation Introduction	Generated 200 sequences	$20^{12}$→200
Structure Prediction	Among $C_{\alpha}$ carbons, those with pLDDT $\lt$ 50 are less than 30%, and those with pLDD10T $\gt$ 70 or more are 80% or more	200→200
Visual Inspection		200→200
Docking & Stability Analysis	Interface score of Rigid Docking complex with CARD of APAF1 is less than Interface score of wild-type Hel2i (-14.1)	200→73
Docking & Stability Analysis	Top 5 in descending order of Interface score of Relax Docking complex with CARD of APAF1	73→5
Docking & Stability Analysis	Interface score of Relax Docking complex with dsRNA is less than Interface score of wild-type Hel2i (-53.0) + 2	5→5

Step	Specific Criteria Setting	Remaining Proteins
Mutable Site Selection		$20^{97}$→$20^{12}$
Mutation Introduction	Generated 300 sequences	$20^{12}$→300
Structure Prediction	Among $C_{\alpha}$ carbons, those with pLDDT $\lt$ 50 are less than 30%, and those with pLDDT $\gt$ 70 or more are 80% or more	300→208
Visual Inspection		208→202
Docking & Stability Analysis	Interface score of Rigid Docking complex with Hel2i is less than Interface score of wild-type Hel2i (-14.1) - 5	202→57
Docking & Stability Analysis	Top 5 in descending order of Interface score of Relax Docking complex with Hel2i	57→5

menu

Menu

Project

Wet Lab

Dry Lab

Human Practices

Team

In conducting our modeling, we paid special attention to the following points:

As a result, several methods were created that future iGEM teams can reference. To share these with other teams, this document provides procedures in as much detail as possible. This overview will outline each of them.

Viral Transmission

This modeling is likely a method that future iGEM teams should reference when utilizing modeling in their projects for the following reasons:

For example, when handling a model of the glycolytic pathway and considering the rate regulation by phosphofructokinase (PFK), the following serves as an example of how the above points can be referenced:

Protein Optimization

Our EVOLVE is useful for other iGEM teams to design and evaluate proteins in silico to improve the binding affinity of proteins that interact with other substances. For this purpose, EVOLVE is designed to be user-friendly in the following ways:

For example, when designing an antibody to detect the conserved region of a viral hemagglutinin, it is necessary to insert an amino acid sequence that sufficiently increases binding affinity. In such cases, the use of our EVOLVE is strongly recommended.

Abstract

Through this viral transmission modeling, we achieved the following:

Introduction

Model

Model Structure

We assume that the process of a virus propagating between cells is a transition of epidemiological states as shown below, and we use the following age-structured TIV model [3] as the mathematical model to describe this.

Boundary Conditions:

Initial conditions:

The variables for the number of cells and virus particles, and the other parameters included in the partial differential equation above, are defined as follows.

Table 1. The variables in the differential equation system.

Table 2. The other parameters in the differential equation system.

In our model, the following detailed assumptions are made:

Analysis

In this section, we discuss a method to analytically determine whether an epidemic will be eradicated, using the age-structured TIV model mentioned above.

Here, N is defined as follows (see Appendix B) [3].

Results

Parameter Estimation

δ(a) was set as a constant $\delta$, and p(a) was specifically defined as the following function [3]. The terms $p_{max}$, $a_{1}$, $b_{1}$​ are positive constants and are the parameters to be estimated.

For the detailed method of parameter estimation, see Appendix C. The results were as follows.

Table 3. Estimated parameters

From these parameters, when T₀ = 1.00×10⁶, we obtain the following:

Calculating with the parameter values above gives $\delta '=7.60\times 10^{-2}$, $N = 37.4$, and $R_{0}=43.2$

Sensitivity Analysis

We calculated the local sensitivity of $R_{0}$​ with respect to $\beta$, $c$, and $T_{0}$​ using the following formula.

Next, we calculated the functional local sensitivity of $R_{0}$​ with respect to $p(a)$ using the following formula (see Appendix C).

Furthermore, we calculated the functional local sensitivity of $R_{0}$​ with respect to $\delta(a)$ using the following formula (see Appendix C).

$e_{p}(a)$ gives the distribution of the contribution for each infection age $a$, and by integrating this over all a, the local sensitivity of $R_{0}$​ to $p(a)$, $E_{p}$​, can be obtained.

Similarly, the local sensitivity of $R_{0}$​ with respect to $\delta(a)$ over all $a$, $E_{\delta}$​, is defined as:

Here, it is found by analysis that regardless of the values of the parameters, $E_{\beta}=E_{T_{0}}=E_{p}=1, ~E_{c}=-1$. $E_{\delta}$​ takes a negative value, but it is not generally -1. If we define $p(a)$ and $\delta(a)$ as in (8) and (9),

Since $\delta, ~a_{1}, ~b_{1}\lt0$, we have $E_{\delta} \lt -1$, and regardless of what values the parameters take,

Discussion & Conclusion

To understand the effect of COCCO on intercellular virus infection, we analyzed a TIV model that considers the infection age. As a result, the following points were clarified:

This method of constructing and analyzing a mathematical model has potential applications for various projects, as follows:

Appendix

AppendixA: Stochastic derivation of average lifespan, burst size & basic reproduction number

AppendixB: Threshold principle of the basic reproduction number

AppendixC: Details of parameter estimation

AppendixD: Derivation of functional local sensitivity of $𝑝(𝑎)$ and $\delta(a)$

References

Show References

Abstract

We achieved the following:

The EVOLVE we constructed is in a format that is easy for users to handle during environment setup and execution. Therefore, our EVOLVE will likely be essential for future iGEM teams to meaningfully design proteins with enhanced interaction capabilities in silico.

Introduction

We have constructed a workflow called EVOLVE to design mutant proteins with interaction capabilities enhanced beyond the wild-type. EVOLVE consists of five steps:

There are two versions of EVOLVE, which differ in the method used in Step 2: Mutation Introduction. A description of EVOLVE will be provided in the following chapters.

In this part, we will discuss the design of the mutant protein via EVOLVE and the evaluation of the resulting fusion protein. Additionally, similarly to the previous part, information that is not necessarily required to be read is compiled in the Appendices.

Workflow

In addition, the input and output for the software used in each step are either an amino acid sequence or a molecular 3D structure, and only the following two file formats are used:

We created a user manual so that anyone can use EVOLVE.

EVOLVE User Manual

Step1. Mutable Site Selection

Step2. Mutation Introduction

Version 1

Version 2

The likelihood score is calculated as -log(P), where P is the probability that the model assigns to a sequence; a smaller value indicates a better fit.

Step3. Structure Prediction

Step4. Visual Inspection

We visualized the predicted 3D structures with PyMOL and excluded sequences that deviated significantly from the original structure, as they could potentially inhibit the complex movements of the fusion protein.

Step5. Docking & Stability Analysis

PyRosetta [12] is software for performing molecular modeling in Python. Using PyRosetta, we docked domains together, predicted the structure of the complex, and scored the stability of the interaction. There are the following two types of docking methods in PyRosetta:

In this procedure, to reduce computational cost, we first performed rigid docking for rough screening, followed by relaxed docking.

Version1

Version2

Results

δ(a) was set as a constant $\delta$, and p(a) was specifically defined as the following function [3]. The terms $p_{max}$, $a_{1}$, $b_{1}$ are positive constants and are the parameters to be estimated.

We calculated the local sensitivity of $R_{0}$ with respect to $\beta$, $c$, and $T_{0}$ using the following formula.

Next, we calculated the functional local sensitivity of $R_{0}$ with respect to $p(a)$ using the following formula (see Appendix C).

Furthermore, we calculated the functional local sensitivity of $R_{0}$ with respect to $\delta(a)$ using the following formula (see Appendix C).

$e_{p}(a)$ gives the distribution of the contribution for each infection age $a$, and by integrating this over all a, the local sensitivity of $R_{0}$ to $p(a)$, $E_{p}$, can be obtained.

Similarly, the local sensitivity of $R_{0}$ with respect to $\delta(a)$ over all $a$, $E_{\delta}$, is defined as:

Here, it is found by analysis that regardless of the values of the parameters, $E_{\beta}=E_{T_{0}}=E_{p}=1, ~E_{c}=-1$. $E_{\delta}$ takes a negative value, but it is not generally -1. If we define $p(a)$ and $\delta(a)$ as in (8) and (9),