The CS model of Pst

Aims

We’ve built an epidemiological model to conduct rapid and accurate detection of the spread of Pseudomonas syringae pv. tomato(Pst) in the greenhouse. This model consists of three parts: the natural infection process, the infection process with early intervention, and its economic benefits outcome. It highlights the significance of our detection tool. Now let's move on to our CA model and SEIR model!

Research on the Transmission of Pst

The SEIR model is one of the most common models in infectious disease dynamics. Its core idea is to more accurately describe the spread dynamics of infectious diseases by dividing the disease into different stages of development. The following figure shows the basic structure of the SEIR model, where $S$ stands for Susceptible (representing susceptible individuals), $E$ stands for Exposed (representing exposed or latent individuals), $I$ stands for Infectious (representing infected individuals), and $R$ stands for Removed (representing deceased or removed individuals).

Fig.1 The Overall Structure of our SEIR Model

Compared with the traditional SEIR model, we have added two new compartments: $W$ and $S_o$ . The difference between the Pst model and traditional epidemiological models lies in the transmission medium. For this kind of bacteria, precipitation and irrigation water are an overlooked inoculum source for disease epidemics. Whether through macro water body testing or genome testing showing high similarity between environmental strains and pathogen core genomes, both approaches vividly and detailed demonstrate the specific condition of water-borne transmission. ¹Therefore, we focus on modeling this aspect.

Based on this, we have added two compartments, $W$ and $S_o$ , representing the pathogen concentration in water and soil, respectively, to simulate the conditions of planting tomatoes in a greenhouse.

Figure 2 fully demonstrates our improvements to the SEIR model for simulating this infection process.

Fig.2 The specific structure of our SEIR model. The red line shows the transmission of pathogens between plants and soil-water components.

The following arguments have helped us develop this model:

a. The presence of asymptomatic carriers involves environmental strains that come into contact with plants through rainwater, irrigation water, etc., and can survive latently on plant surfaces or within plant tissues. So we divide compartment E into two parts: surface colonization and internal latency. Pst-related strains in the environment (such as rainwater isolates) can be transmitted through irrigation water during the latent period, creating a need for early intervention.

b. For the infection of Pst, asymptomatic carriers or latent strains can be divided into infectious and non-infectious categories. Some lineages persisted in plant vascular tissues without symptoms and required specific triggers to activate virulence.

As for vascular bundle transmission, its essence is the systemic spread of pathogens within a single plant, rather than direct transmission from one plant to another. Therefore, we divide compartment I into two parts: local infection and systemic infection. When Pst invades through stomata or wounds on leaves, if it successfully breaks through local defenses, it can enter and utilize the plant's vascular system for movement. The pathogen reproduces and moves with the fluid flow in the vascular bundles' xylem or phloem, spreading from the initial infection site to other parts of the plant, including stems, leaves, and roots, resulting in systemic infection. This results in the entire plant becoming infected, rather than just localized lesions.

Fig.3 The infection process of Pst on tomatoes.

Model Structure of SEIR Model

Assuming tomatoes are planted in a large square farmland with uniform distribution.

We list the required notations and their explanations as follows.

Table1. The explanation of the added parameters

parameters	Description
$E_s$	The plants colonized by bacteria on their surface
$E_i$	The plants whose interior has been invaded and harbors bacteria
$I_{local}$	The plants that have not yet been invaded by vascular bundles but have already been infected
$I_{systemic}$	The plants whose vascular bundle has been invaded and infected
$W$	Pathogen concentration in water bodies
$S_o$	Soil pathogen concentration

Table2. Meanings and explanations of other variables

Parameters	Name	Description
$β_{surf}$	Surface Colonization Rate	The rate at which environmental pathogens (`W`) at unit concentration successfully attach to the surface of susceptible plants (`S`)
$β_{int}$	Direct Invasion Rate	The rate of direct successful invasion of environmental pathogens (`W`) into plant interiors (bypassing surface colonization)
$φ$	Surface-to-interior Penetration Rate	The rate at which surface-colonizing bacteria (`E_s`) successfully penetrate plant physical barriers and enter internal tissues
$σ$	Incidence Rate	The rate at which internal pathogen carriers ( `E_i`) develop into local lesion infectors ( `I_local`). 1/σ represents the incubation period.
$κ$	Infection Systematization Rate	The rate at which pathogens in local infections (`I_local`) successfully invade the vascular bundle and develop into systemic infections (`I_systemic`)
$γ_s$	Surface Clearance Rate	The rate at which plants remove surface-colonizing bacteria (`E_s`) through rainfall, their own secretions, or other mechanisms
$γ_i$	Local Infection Removal Rate	The rate at which locally infected individuals (`I_local`) are removed by farmers or die due to local diseases
$γ_{sys}$	System Infection Removal Rate	The systemic infection rate (`I_systemic`) is determined by the rate of whole-plant wilting, death, or removal.
$η_s$	Surface Microbial Release Rate Into the Environment	The rate at which surface colonizers (`E_s`) release pathogens into the environmental pathogen pool (`W`)
$η_i$	Disease Spot Environmental Release Rate	The rate at which local infected individuals (`I_local`) release pathogens into the environment (`W`) through lesion exudation per unit quantity.
$η_{sys}$	System Infection Environment Release Rate	The rate at which systemic infected units (`I_systemic`) release pathogens into the environment (`W`) through vascular exudation and root secretion
$δ$	Environmental Pathogen Decay Rate	Natural mortality rate of pathogenic bacteria in the environmental pathogen pool (`W`)
$δ_o$	Soil Reservoir Decay Rate	Natural decay rate of pathogens in soil/diseased residue reservoir (`Sₒ`)
$μ$	Rate of Diseased and Residual Plant Material Returning to Soil	The rate at which infected individuals (I) deposit pathogen-carrying remains into the soil reservoir ( $S_o$ )
$p$	System Infection Rate	The probability of an internal latent agent ( $E_i$ ) manifesting symptoms directly as systemic infection ( $I_systemic$ )
$V(t)$	Reseeding Function	Complete the function of $S$
$ξ(t)$	Soil Function	Rate of pathogen release from soil to water source

We obtain the equation as follows:

Model Structure of CA Model

Thus, we used this equation to simulate the infection of Pst on tomato plants across an entire farmland. However, unlike typical models, since we are simulating an entire farmland, the larger environment means that $W$ and $S_o$ actually follow certain distributions.

So we incorporated cellular automata（CA model）for simulation. A CA model consists of the following basic components:

a. Grid: A regular lattice of cells. In our model, each tomato plant occupies one grid.

b. States: Each cell can be in one of a finite number of states. In our model, it refers to the above six plant states.

c. Neighborhood: A definition of which cells are considered neighbors of a given cell. In our model, it specifically refers to the tomato plants grown in the surrounding area.

d. Rules: A set of rules that determines the next state of a unit based on its current state and the states of neighboring units. In our model, it refers to the pathogen concentration in the environment and the SEIR model.

Then, in order to better reflect the impact of the environment on the infection, we accessed the weather data of Beijing and set some parameters to be weather-related. And since there are few models for Pst transmission in the literature, we make a qualitative analysis here.

According to the investigation, the optimum temperature range for Pst infection and disease development is about 15 ° C to 22 ° C. When the temperature exceeds 30 ° C, the disease development will be significantly inhibited or even stopped. ² Taking β as an example, we design it as the product of two key environmental factor functions.

Among them， $β_{max}$ is the maximum potential infection rate under the most ideal environmental conditions. $f_{\text{temp}}(T(t))$ is a temperature effect function, which is based on the current temperature and returns a tuning factor between 0 and 1. $RH(t)$ is a temperature effect function, which depends on when the humidity returns a tuning factor between 0 and 1.

For the temperature benefit function, we introduce the beta function:

We set the base point temperature for Pst that, $T_{min}$ = 10 ° C, $T_{opt}$ = 20 ° C, $T_{max}$ = 30 ° C.

This function will output a value between 0 and 1 when $T$ is between $T_{min}$ and $T_{max}$ . When $T = T_{opt}$ , the function value reaches the maximum value 1 ; when the temperature exceeds this range, the function value is 0. This function perfectly captures the properties of Pst.

For humidity, we use a humidity gating function to qualitatively simulate the propagation ability from humidity.

So we substitute the data for calculation. Part of the data is derived from analog data of other bacterial pathogens ( Tomato Bacterial Canker ) in tomatoes.

Table3. Parameter value, Ref³

Parameters	Name	Value	Units
$β_{surf}$	Surface Colonization Rate	$β_{surf}{max}$ = 0.286	$cell / (pathogen \enspace units * day)$
$β_{int}$	Direct Invasion Rate	$β_{int}{max}$ = 0.706	$cell / (pathogen \enspace units * day)$
$φ$	Surface-to-interior Penetration Rate	0.1	$1/day$
$σ$	Incidence Rate	0.1	$1/day$
$κ$	Infection Systematization Rate	0.05	$1/day$
$γ_s$	Surface Clearance Rate	0.5	$1/day$
$γ_i$	Local Infection Removal Rate	0.083	$1/day$
$γ_{sys}$	System Infection Removal Rate	0.05	$1/day$
$η_s$	Surface Microbial Release Rate Into the Environment	0.5	$pathogen\enspace units / (cell*day)$
$η_i$	Disease Spot Environmental Release Rate	0.6	$pathogen\enspace units / (cell*day)$
$η_{sys}$	System Infection Environment Release Rate	0.9	$pathogen\enspace units / (cell*day)$
$δ$	Environmental Pathogen Decay Rate	0.2	$1/day$
$δ_o$	Soil Reservoir Decay Rate	0.1	$1/day$
$μ$	Rate of Diseased and Residual Plant Material Returning to Soil	0.7	$pathogen\enspace units / (cell*day)$
$p$	System Infection Rate	0.1	$dimensionless$

The image below shows the result.

Fig.4 The dynamic infection process simulated by the code

Fig.5 Infection results

The above image clearly shows the infection process of Pst in a large area of farmland. The above image clearly shows the infection process of Pst in a large area of farmland in Beijing from May to July 2025, when there is no suitable detection method.

It can be seen that in the common tomato growing season from May to July, if not controlled, Pst will basically completely infect the entire tomato field in just about 90 days, resulting in very large economic losses. Therefore, the hardware group developed a corresponding detection device to detect whether the plant was diseased. What we need to consider is how to use these detection devices reasonably, so as to reduce Pst infection and maximize economic benefits.

View our code in igem gitlab.

Economic Benefit Model

In the previous CS model, we simulated the infection process images and visualization model of pathogens on plants in a farmland. To further enhance the practical applicability and guidance of our project and product, we will utilize the Q-learning model to simulate the economic benefits throughout a complete growth cycle. Now let's move on to our economic benefit model!

Background

The yield potential of facility-grown tomatoes is tremendous, ranging from 5,000 kg per mu in conventional greenhouses. Meanwhile, tomato market prices exhibit a typical "U-shaped" annual fluctuation pattern. In January 2022, affected by supply-side factors, the national wholesale average price soared to 7.29 yuan/kg, a year-on-year increase of 73.5%.

We can see that the profitability of tomato cultivation, whether it makes a profit and how much, is closely related to market conditions and weather factors in that particular year. Moreover, if farmers encounter pests and diseases and do not take timely control measures, the resulting negative economic impact can be significant.

The greenhouse is planted with fresh tomatoes that have better taste but poorer disease resistance, requiring more labor input. What farmers need most are tools and strategies for rapid detection of pests and diseases. Our project is fundamentally based on early detection of Pst before symptom appearance, and the hardware team has developed a corresponding Kit. Our economic benefit model theoretically demonstrates the Kit's application, highlighting the economic benefits our device can bring. Below are the detection tools and relevant parameters we have developed for Pst:

We have conducted research on the parameters related to tomato cultivation and sales, which are listed below.

Table4. Related economic parameters

Parameters	Description	Source
Tomato price per unit	$0.44 -$ 0.56 $$/Lb$	Ref ⁴
Planting density	4500 $plants/acre$	assumption
Individual plant weight	13.62 $Lb/acre$	Ref ⁵
Planting cost	5000 $$/acre$	Ref ⁶ and assumption
Cost per single plant transplant	0.42 $$/plant$	Ref ⁷
The cost of Protato kit	1.6 $	From Drylab

Training Principles and Process

This model employs the Deep Q-Network (DQN) algorithm from Deep Reinforcement Learning (DRL) , applying Q-learning to train an artificial intelligence "agent" capable of making optimal disease detection decisions based on daily crop and environmental conditions.

Q-learning is a reinforcement learning algorithm that trains agents to assign values to possible actions based on their current state. The foundation of this learning model is Markov decision, that is , in reinforcement learning, an agent interacts with an environment by taking actions that affect the environment. After each action, the environment transitions to a new state with certain probabilities. Meanwhile, the environment provides feedback to the agent in the form of rewards based on an underlying reward function. And for any finite Markov decision process, Q-learning will find an optimal policy, which maximizes the expected total reward over any and all consecutive steps starting from the current state. Q-learning can determine the optimal action selection strategy for any given finite Markov decision process.

Fig6. The Principle of Q-Learning

In this code, we still use the SEIR model mentioned earlier to simulate the infection of Pst, with parameters that reflect the pathogen's life cycle and spread dynamics. Crucially, every action has a cost, and every outcome has an economic consequence. This forces the agent to learn the complex trade-offs between the cost of advanced detection methods and the potential long-term loss from inaction, mirroring the real-world cost-benefit analysis a farmer must perform.

Besides, the final output of the training process is not a fixed set of rules, but a highly adaptive and intelligent policy. In other words, it will continuously adjust its strategies based on the training outcomes of multiple simulated growth cycles, optimizing economic benefits through direct feedback. This is precisely the method and data that farmers genuinely need.

Moreover, to more accurately simulate farmers' decision-making during crop growth, we introduced field weather data from May to July 2025 in Beijing for simulation. After incorporating the weather data, AI can now learn to associate plant health conditions with specific weather conditions. Second, it can make more forward-looking decisions. For example, based on the previous Pst infection analysis, we know that Pst spreads much more efficiently on cold, rainy days than in dry, sunny weather. Therefore, if AI knows there will be heavy rainfall in the next period, it may use Kit sampling for detection. Based on its experience learned from thousands of simulations, rainfall greatly promotes disease spread.

After the training, we designed a decision log to reflect the optimal detection strategy for each day, which allows us to analyze the behavioral patterns learned by deep learning and obtain long-term reward data.

Fig7. At the start of training

Fig8. One of the training processes. The neural network updates every ten iterations. From Figure 8, we can see its training process, which involves continuous trial and error to obtain the final economic benefit value and make adjustments accordingly. Epsilon refers to the "exploration rate", which represents the probability of the training agent randomly attempting detection methods to discover new and better strategies. This value gradually decreases during training.

In the code, we have designed 3 detection methods: doing nothing, visual inspection, and using our kit. Moreover, we have labeled the estimated detection success rate and economic cost required for each method.

To improve accuracy, we have set 8 AM, 12 PM, 4 PM, and 8 PM as testing times every day. The AI will observe based on the input weather and environmental conditions to determine whether to conduct testing.

Training results

Fig9. Training outcomes.

The revenue generated by each planting season has been gradually increasing. The horizontal axis represents the number of training iterations. The vertical axis represents the total revenue divided by 100. The economic benefit increased from approximately 27,500 RMB to the final convergence value of approximately 36700 RMB We can see that AI, through a continuous trial-and-error process, attempts different detection plans each day based on factors such as weather and environment. After attempting sufficient iterations, it retrieves plans with higher Q-values and greater efficiency from its database. This diagram clearly demonstrates its self-learning capability and the feasibility of simulation.

Finally, the training results will be summarized in a table format, like "Day16，20:00 | Soil(H:25%, T:15.8℃) Air(T:18.0℃, H:57%, R:2.6mm) I Healthy:0.27 I Action: 1: No Action" , to list the simulated plant infection situations and corresponding AI treatment methods for each time period of the day. This theoretically provides direct guidance for growers' detection work.

Fig10. Form for guiding farmers in inspection work.

The primary achievement of this codebase is the creation of a fully functional prototype for an agricultural Decision Support System (DSS). Unlike systems that simply classify disease, this project tackles the far more complex problem of sequential decision-making under uncertainty. Overall, Traditional Pst research and control strategies have typically focused on the biological aspect. For example, copper-based sprays are applied under specific conditions, following "reactive" rules usually based on experience or fixed thresholds. In contrast, our model centers around an economic optimization problem.

The core objective of the code is to maximize profits throughout the entire growing season, evaluating control measures from a novel bio-economic perspective to better implement practices and generate greater benefits for farmers. This is accomplished by training an AI agent within a custom-built, high-fidelity simulation of a tomato farm. This virtual environment serves as a risk-free "digital twin" where the agent can conduct thousands of trial-and-error experiments—representing thousands of growing seasons—without any real-world cost or crop loss.

View our codes in igem gitlab.

Future Work

In the future, our work will mainly focus on improving our parameter database. For example, we will use COMSOL to simulate the diffusion of Pst in plant tissues such as mesophyll cells and vascular bundles, in order to obtain the specific values of the parameters within.

Furthermore, out economic model should be more rigorous. Like the cost of components, reagents, disposables, labour, throughput must be added to improve the accuracy of this model.

References

1 Vinatzer, B. A., Monteil, C. L., & Clarke, C. (2015). Population genomics of Pseudomonas syringae pv. tomato to unravel emergence and modes and routes of transmission. Acta Horticulturae, 1069(1069), 289–292. https://doi.org/10.17660/ActaHortic.2015.1069.41 ↩

2 Gullino, M. L., Gilardi, G., Sanna, M., & Garibaldi, A. (2009). Epidemiology of Pseudomonas syringae pv. syringae on tomato. Phytoparasitica, 37(5), 461-466. https://doi.org/10.1007/s12600-009-0055-2 ↩

3 Kawaguchi, A.; Kitabayashi, S.; Inoue, K.; Tanina, K. A PHLID Model for Tomato Bacterial Canker Predicting on Epidemics of the Pathogen. Plants 2023, 12, 2099. https://doi.org/10.3390/plants12112099 ↩

4 United States Department of Agriculture. (2025, October 6). TOMA TO FAX Report. Agricultural Marketing Service Specialty Crops Program Market News Division. Retrieved from https://www.ams.usda.gov/mnreports/fvdtomf.pdf ↩

5 University of California, United States Department of Agriculture, & other cooperating institutions. (2000, May). Crop profile for tomatos (fresh market) in California. National IPM Database. https://ipmdata.ipmcenters.org/documents/cropprofiles/CAfreshmarktomatos.pdf ↩

6 German, B. (2023, August 3). New study highlights increased production costs for processing tomatoes. AgNet West. https://www.agnetwest.com/example-url-here/ ↩

7 United States Department of Agriculture, National Agricultural Statistics Service. (2024, August 29). 2024 California processing tomato report. https://www.nass.usda.gov/ca ↩