Software

Overview

Our modeling and software approach, developed under the umbrella of Stingwatch, went through multiple iterations. As the Wet Lab faced various limitations and complications, the Dry Lab expanded its role to interact with social work and proactively engage with stakeholders. Encouraged by feedback from these stakeholders, we decided to extend our work into epidemiological modeling, undergoing several rounds of refinement (multiple iterations). We used an approach inspired by systems and synthetic biology to analyze human populations as living systems. Over time, we realized that this perspective aligns closely with the frameworks of systems medicine and epidemiology.

Stingwatch 1

Design As a proof of concept, we decided to design two neural network architectures for time series prediction: a simple one-layer neural network and a Long Short-Term Memory (LSTM) deep neural network.

Build We built both within a single comparative pipeline to ensure the use of the same dataset and enable a fair comparison of results. MATLAB was chosen as our coding language since it was already being used for other models in the project.

Test Both architectures were tested using an epidemiological time series dataset. The LSTM significantly outperformed the one-layer neural network, leading us to select it as our main tool for epidemiological prediction.

Learn We learned that LSTM models are particularly effective for epidemiological prediction involving seasonal variation. The next step was to apply this methodology to real-world data.

Stingwatch 2

Design With a functional model in place, we needed to secure real-world data. We decided to build two datasets—one at the municipal level and another at the state level—with monthly intervals covering data from at least 2019 to 2024. Our hypothesis was that envenomation rates would be primarily influenced by seasonal variation, while mortality would be determined by social factors. We selected four states: Morelos, Jalisco, Guanajuato, and Guerrero.

Build Constructing these datasets proved challenging. Information on scorpion envenomation is scattered across multiple agencies, and we lacked access to the direct surveillance systems tracking sting cases. Environmental and social data were also fragmented: CONAGUA provided detailed climatological records, while INEGI offered social data that are only updated periodically. The greatest obstacle was obtaining sting data. Reliable seasonal data were unavailable, with only yearly death tolls accessible. The only consistent weekly and monthly records existed at the state level.

Test We tested the state-level dataset with the LSTM model. Results showed variations in prediction accuracy, with Guerrero underperforming significantly, and no model achieving an error rate below 20%.

Learn We learned that, at the state level, environmental factors—such as temperature and rainfall—are the main determinants of scorpion envenomation, aligning with the well-documented seasonality of scorpionism. However, the model failed to capture the social dimension—how marginalized communities are disproportionately affected. We attributed this limitation to the scale of the data: state-level modeling masks local heterogeneity. To build better models, we needed more detailed, fine-grained datasets.

Stingwatch 3

Design We contacted Mexico’s highest epidemiological authority, the Directorate-General for Epidemiology (DGE), through the Directorate for Surveillance of Non-Transmissible Epidemiological Diseases. We presented our model as proof of concept, demonstrating that the limitation was not methodology but data scale. The DGE offered to collaborate and provided access to a new surveillance dataset for 2025. While its limited historical range raised concerns about predictive performance, the dataset contained rich information about individuals affected by scorpion envenomation. We therefore designed two additional models to analyze it: a Multilayer Perceptron (MLP) and an autoencoder plus clustering model.

Build We adapted the dataset for use with the LSTM model and built new architectures capable of handling the multiple categorical variables within the data. The new models were tested at two scales, while the LSTM continued using municipal-level data.

Test The new dataset did not outperform the previous one in the LSTM model. However, the MLP and autoencoder-clustering models revealed valuable insights into the conditions and contexts of scorpion sting victims, highlighting variations across states and municipalities.

Learn We confirmed that longer historical datasets are necessary for LSTM models to be effective. Nonetheless, we identified distinct clusters of individuals with consistent characteristics who are more likely to suffer scorpion stings. We shared these findings with both the DGE and Redtox, hoping they could inform public health interventions.

Stingwatch 4

Design Having developed predictive and preventive models, we wanted to address another critical issue in scorpion envenomation and Mexican healthcare more broadly: logistics. Using the DGE dataset, we designed a multi-objective optimization problem to improve medical supply chains.

Build Simulating national-scale medical logistics would be overly complex for this project, especially for antivenoms. Given the centralization of medicine distribution in Mexico, we focused on a single municipality, Yautepec, which was also central to our human practices.

Test The model was used to simulate medicine delivery logistics and assess whether local optimization could improve accessibility. It represented two related phenomena: the distribution of medicine itself and the movement of patients seeking treatment.

Learn The model showed that expanding medical coverage increases costs, but these costs are unevenly distributed. In resource-scarce conditions, marginalized communities face disproportionately high travel costs to reach better-supplied medical centers. To effectively model community-level medical logistics, we would need a more complex system capable of capturing human decision-making in emergencies. While top-down optimization can improve logistics at the governmental level, individuals cannot optimize access to healthcare. Therefore, improving local infrastructure is key—so that people don’t have to travel long distances for medical care. Governments can optimize supply chains; people cannot optimize their health.

Contributions

Stingwatch provides the following contributions to iGEM teams:

  • A primer on the use of deep learning and multi-objective optimization methods for biological computation. The provided code can be easily adapted for use with other datasets, even by teams with limited coding experience.
  • A framework for integrating systems epidemiology or traditional epidemiology into health-related iGEM projects, allowing teams to extend their impact beyond the traditional Wet Lab–Dry Lab paradigm.
  • Multi-objective optimization with evolving algorithms. Real biological problems involve competing objectives all the time. Many approaches to solving them rely on fixing certain objectives to be able to use single optimization methods. Normalizing and expanding the use of platforms like Platemo allows teams to represent more solutions and experiment with the competing nature of biological objectives.
  • Long short-term memory models. Natural phenomena can have variations with time in many areas, like environmental modeling and health. Using LSTM models for time series prediction allows teams to develop proactive plans on how to deal with them.
  • Use your project as part of a larger system. Early on our stakeholders made it clear that antivenoms are a component on a larger issue of scorpion envenomation. With this we decided to tackle the whole problem. We recognize a large number of problems that we cannot solve by ourselves but can connect with people who can.

For Primer

The codes referenced at the end of the primer for Computational Intelligence (CI) are available for download below. These files contain the scripts and resources used to perform the analyses described, allowing for reproducibility and further exploration of the methodologies applied in this work.

For Models

Explore our computational and epidemiological models that power the Stingwatch platform. Use the buttons below to access detailed model descriptions and downloadable resources.