INTRODUCTION

Identifying conserved regions across multiple sequences can be a laborious task due to the extensive behind-the-scenes work required to align and analyze a multitude of sequences, compounded by the lack of user-friendly tools and software available. This challenge was no exception for our dry lab team, as we analyzed hundreds of H5N1 avian influenza sequences in search of conserved regions that could assist us in creating a reliable target strand.

To tackle this issue, we developed a user-friendly sequence alignment tool called RumiSeq, which is capable of detecting conserved regions and calculating conservation rates with precision and ease.

PLANNING

During the planning phase of RumiSeq, we highlighted three main goals that we wanted to focus on to ensure the software would be impactful, accessible, and adaptable.

User-friendliness: While trying to analyze the different H5N1 avian influenza strains, we came across many tools that were extremely difficult to use and understand. That is why we utilized a frontend Python framework to ensure that people with varying levels of experience would be able to successfully use this tool.
Applicability: Ensuring that our tool was not too project-specific and could be applied in a wide range of uses was very important to us. Incorporating SBOL export was a key decision because it is the synthetic biology standard, allowing users to integrate their findings with other tools.
Modularity and Reusability: Designing the code in a way that allows other teams to build upon our foundation was a priority. Each component was created as a function, promoting modularity and reusability. This enables users to utilize independent functions and tailor them to their specific needs.

A key design choice we made was to use Biopython libraries for lightweight alignment-based comparison, which helped keep our software fast and intuitive. However, this meant we had to limit the number of sequences and nucleotides we could accept. Shifting mutations to align with the detected conserved region was another intentional choice, made to enhance alignment clarity even if it meant representing sequences slightly differently than the raw data. As previously mentioned, we integrated SBOL into RumiSeq to increase its flexibility and accessibility.

ARCHITECTURE

The architecture of RumiSeq is comprised of several key technologies that work together to deliver a robust and efficient software experience.

Python: The core logic and language of our software utilized Python to create all the basic functions and components.
Biopython: Specific libraries that handle FASTA file parsing, sequence analysis, and alignment were imported for Biopython[2] and used primarily in the backend. These modules enabled the biological computation required for detecting conserved regions.
SBOL libraries: To export data in a standardized SBOL[3] format, we integrated the pysbol3 library to create components and annotations that highlight key findings such as conserved regions and mutations. This format is compatible with other synthetic biology tools.
Dash by Plotly: For the frontend and client-side interface, we used Dash by Plotly[4] a Python framework that incorporates HTML and CSS into a Python-based environment. This allows users to visualize their inputted sequences without relying on external servers like Flask.

Overall, this architecture allows for a user-friendly, intuitive, fast, and accurate representation of sequence alignment.

KINETIC OPTIMIZATION WITH NUCLEOTIDE SWITCHING

We implemented an “interval nucleotide switch” approach in NUPACK to assess how single-base substitutions alter S-I stability and influence strand displacement kinetics. The goal was to accelerate forward displacement while ensuring that S and I strands remained bound and that S-T pairing remained dominant throughout the reaction.

Method: Substrate strands of 10-300 nt were tested with varying toehold: substrate ratios (5-50%). At defined intervals (3-10 nt), single-base switches were introduced on the incumbent strand (C→A, A→C, G→T, T→G) to reduce complementarity and lower S-I stability. Each variant was re-evaluated in NUPACK Test-Tube Analysis under assay conditions (25 °C, 10 nM S, 9 nM I, 1nM T, dna04.2 energy model, max 3 complexes) [1].
Acceptance Criteria: Designs were considered viable when S-T yields exceeded 95% with greater than 5% leak from S-I persistence or other unwanted outcomes. Switching intervals that achieved this balance were marked as acceptance regions (✓), while conditions that fell below thresholds were rejected (X).
Output: The switching analysis revealed clear relationships between strand length, toehold ratio, and switching interval. Short substrates (10 nt) tolerated almost no switching, while intermediate lengths (20-50 nt) required 4-6 nt switching to maintain stability. Longer strands (80-300 nt) consistently allowed broader switching intervals, indicating greater robustness.
Integration: This tool provides a systematic way to identify the minimum switching interval needed for stable S-T binding across designs. For Rumino, it ensures that optimized kinetic acceleration via incumbent destabilization does not compromise full binding. Results of the experiment are shown in the link below:

Nucleotide Switch Experiment Results

IMPLEMENTATION

Initially, the user is redirected to the homepage where a short blurb about the software is available, along with deeper instructions on how the entire system works. This ensures compatibility with different levels of experience from the variety of users. The upload page is accessible using the sidebar located on the left.

Then, on the upload page, the user has two choices: they can either upload a FASTA file or input their sequences manually. It is important to note that there are set limits on the file size and the number of sequences allowed to be parsed at once. Error messages are available to alert the user of any problems.

After a successful upload, the user is redirected to the visualization page, where their imported sequences along with conserved region highlights and mutation shifts can be visualized. At the bottom right of the sequence visualization, the user has the option to export the findings in SBOL file format.

Below, on the same page, the user can view the conserved features description, which includes the number of conserved regions found, the conserved sequence, its length, and its position all displayed in a box beneath the visualization.

The user can also calculate the most conserved sequence based on a custom input length. A box will appear below showing the number of sequences, the percentage of conservation, the position, and the conserved sequence.

Finally, at the bottom of the page is the design strands feature. RuminoSeq incorporates strand design into its functionality specifically substrate and incumbent strand design.

USAGE

Detailed steps on how to install and run our software can be found on our Gitlab

Steps:

Step 1: Clone the repository

        git clone https://gitlab.igem.org/2025/software-tools/ucalgary.git

        cd ucalgary

Step 2: Install dependencies

        pip install -r requirements.txt
      

Step 3: Navigate to appropriate directory

        cd rumiSeq
      

Step 4: Launch the app

        python app.py
      

Visit the app in your browser

ADDITIONAL SOFTWARE TOOLS

Safety Bowtie Software

My team decided to develop a safety-focused tool that allows the user to create a bowtie diagram used for assessing risks and enhancing safety. Safety is the foundation of any synthetic biology project because it protects not only ourselves but the environment and our community. Mapping out potential hazards in an interactive and accessible way is paramount in enabling teams to see safety as a priority. A main goal for safety this year is to make it more accessible. Part of how we are doing this is by creating simple and eye pleasing diagrams or infographics that makes safety less like a burden, so others are more willing to interact with the information. This safety software will help us to achieve this goal by achieving simplicity and efficiency of creating a hazard analysis.

This software allows for easy creation of bow tie diagrams, without losing any crucial information. The software was designed to follow the methodology of identifying a hazard and a top event first. From there branching out and thinking of the potential threats and consequences of the top event. Every threat and consequence will have a barrier to prevent them from occurring or reducing the damages they may cause. From there, escalation factors will be added to cover any potential reasons that the barrier could fail.

Showing screen for bowtie safety software

Build Instructions:

Step 1: Clone the repository

        git clone https://gitlab.igem.org/2025/software-tools/ucalgary.git

        cd ucalgary

Step 2: Install dependencies

        pip install -r requirements.txt
      

Step 3: Navigate to appropriate directory

        cd rumiMap
      

Step 4: Launch the app

        python app.py
      

Visit the app in your browser

RumiMap

Finally, another additional software we created was RumiMap. This is a prototype for our biosensor that we hope to deploy in the future. It was built using the frontend Python framework Dash by Plotly to simulate how the user will interact with the data collected by the biosensor. This web app allows users to view the detection results collected and experience how our system will function in a real-life setting. It is still in early development, but it accurately demonstrates how our project could be used by clients.

CONCLUSION

We were able to quickly and accurately detect conserved regions that have a conservation of 90% or higher, allowing the user to see frequently similar regions across multiple sequences. Integrating the SBOL export allowed us to streamline the integration with other widely used and available synthetic biology tools. Furthermore, the incorporation of the mutation shift allowed us to make RumiSeq more visually representative.

Some challenges our team faced in the integration of the frontend aspect of RumiSeq was handling large datasets without causing the UI to crash. Due to this challenge, we were limited in the number of sequences and nucleotides we could parse at once.

For our future directions, our team wants to incorporate BLAST and ClustalW into the alignment process while increasing the number of sequences we can parse at once. We would like to expand the features available, including the optimization of the toehold switches.

REFERENCES

[1] Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, Dirks RM, Pierce NA. NUPACK: Analysis and design of nucleic acid systems. J Comput Chem. 2011 Jan 15;32(1):170-3. doi: 10.1002/jcc.21596. PMID: 20645303.

[2] Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25(11):1422–1423.

[3] Galdzicki M, Clancy KP, Oberortner E, Pocock M, Quinn JY, Rodriguez CA, Roehner N, Wilson ML, Adam L, Anderson JC, et al. 2014. The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology. Nature Biotechnology. 32(6):545–550.

[4] Plotly Technologies Inc. 2015. Dash: A Python framework for building analytical web applications. [accessed 2025 Oct 7]. https://dash.plotly.com/