Introduction

While the goal of our project is to detect the upregulation of miR399f in Arabidopsis thaliana, the technology behind our diagnostic test will be applicable to detecting the upregulation of any specific miRNA present. To fully take advantage of this capability, there is a need to identify various specific miRNAs upregulated with specific stress responses. To ensure optimal action of the test, these miRNAs should have unique sequences, compared to other present miRNAs not associated with the response, as it would make our test more specific and reduce the likelihood of a false positive.

To this end, we developed a software tool for miRNA sequence and function comparison. With an organism selected, its miRNAs are all plotted on a graph based on their sequence similarity, before being clustered based on their annotation, obtained from gene ontology analysis. This allows for the identification of potential miRNA target candidates for diagnostic testing - miRNAs with the desired target response, but also low sequence similarity with unrelated miRNAs. The tool was initially developed for A.thaliana, but was extended to all the species present in miRBase (a miRNA database). We are aware that miRNAs are an area of growing interest, in general, and in the iGEM community - several teams working with them last year, including Wageningen and Patras Medical School. Hence we hope this tool can be used by researchers and iGEM teams to identify miRNA candidates for other, more general, diagnostic or therapeutic use cases.

Development Process

To make rapid and effective progress, we followed an iterative development process. Initially we limited ourselves to just miRNAs present in Arabidopsis thaliana, performing the sequence comparison and developing in python for high development velocity. The second prototype then incorporated the GO annotations assigned to each miRNA, using the GOLR Gene Ontology Database API, allowing us to filter miRNAs by function. The final prototype was then implemented in Rust using Tauri. This was to maximise ease of use for future researchers, as it would be a native cross-platform app that could easily be distributed. Also to facilitate use in research, we added the ability to export the data in various formats.

Usage Workflow

Example of typical workflow using the software
  1. First the organisms of interest are selected, with the miRNAS present in these organisms displayed in a table, showing their name, the organism they are found in and their mature sequence.
  2. For these miRNAs, a distance matrix is then calculated. This involves performing a sequence similarity calculation, using Needleman-Wunsch, for every pair of miRNA.
  3. From this, the pairwise distance can be calculated by subtracting the length of the miRNA from the similarity score (i.e. distance of 0 means the miRNAs were identical). These together make up an N x N distance matrix where N is the number of miRNAs.
  4. Using PCA, we then decrease the dimensionality to 2, such that we can plot it on a 2D graph, which is the distance plot. Any 2 points far apart on the graph must have a large difference in sequence.
  5. With the distance graph plotted, the Gene Ontology GOLR API is used to query all the annotations on the selected miRNAs.
  6. These annotations can then be selected, which highlights them on the graph, allowing clusters of similar annotations to be visualised.
  7. Data can then be exported in a number of ways - the data on the miRNAs can be exported as a JSON file or the sequences can be exported as a FASTA file. The graph can be exported as an svg and the raw distance matrix can also be exported as a JSON file.

Architecture

Following Tauri conventions, the software follows a client-server architecture despite being a single desktop app. The client is written in web technologies (JavaScript, HTML and CSS) which is rendered to a WebView. The server is written in Rust and has access to the operating system and resources.

State Management

To ensure state is synchronized across all threads, and in the front and backend, there is a single source of truth for the state, managed in the Rust code. This is behind a Read/Write Lock, allowing multiple threads to have read-only access to the state, but locking threads like a mutex if write access is requested. This prohibits data races, but allows for deadlocks, meaning careful design is necessary, to ensure locks are dropped as early as possible to prevent deadlocks. For the JavaScript code, the state is copied to it on change, so it only ever has a read-only access to the state. This simplifies the software architecture since the JavaScript cannot mutate the state, so only the Rust code has to be carefully designed.

Asynchronous Execution

To have a responsive UI, all the JavaScript client code is run on a separate thread from the Rust server code. This ensures any blocking code run in Rust does not block the main UI thread. Since they are running on separate threads, the code will be executed independently of each other, and so asynchronous handling is necessary. The main mechanism for communication between the client and server is through events - user interactions with the JavaScript code trigger events that call certain Rust code. These can then emit events when the work they have done is finished, which triggers updates in the JavaScript code.

Example of events which may be emitted between client and server

Installation

To make the software as accessible as possible, we made it easy to build and install the software. We are able to generate installers for both Windows and MacOS, meaning the software could be easily distributed, even to those less experienced with using the command line.

The requirements for building the app are minimal:

  • Deno is necessary for managing and executing JavaScript dependencies.
  • Rust is required for managing and executing the Rust code.

Steps

  1. Clone the repository.
git clone https://gitlab.igem.org/2025/software-tools/cambridge.git
cd "cambridge\GO Explorer"
  1. Install dependencies
deno install
  1. Build installer and execute.
    • On Windows
      deno task tauri build
      .\src-tauri\target\release\bundle\msi\gene-ontology_1.0.0_x64_en-US.msi
    • On MacOS
      deno task tauri build --bundles dmg

Documentation

In order to make the codebase accessible and approachable to extension, documentation is essential. All the code in the software is well documented with comments, and this wiki page serves as further documentation of design principles and reasoning behind architecture.

To view generated Rust documentation:

cd ./src-tauri
cargo doc --open --document-private-items

Evaluation & Future Steps

The success of this software is in supplying a foundational tool to accelerate advancement in miRNA research. It allows for a quick and easy search of miRNAs in an organism of interest, paving the way for further research, through exploration of the influence of sequence on miRNA function, or for narrowing down candidate targets to a short list. Despite the utility of the tool, it has a few limitations that we would seek to improve had we more time, such as the number of GO annotations available for the miRNAs. This is certainly a limiting factor for the scale of the model, since fewer annotated points means fewer trends in clustering of similarly functional miRNAs can be observed. It also reduces the number of targets for diagnostics. To combat this, we could potentially employ LLMs to process papers and extract the relevant annotations from them. This would reduce the lack of coverage of miRNA annotations, as instead of relying on manual insertion into the Gene Ontology Database, this could be automated. Overall, however, the tool remains usable for well annotated organisms, such as Arabidopsis thaliana, allowing us to make inferences about common organisms.