Software

Generates toehold switches automatically from raw data, splitting transcripts into k-mers and optimizing structure.

Accepts raw Nanopore sequencing data and selects top differentially expressed transcripts.

Intuitive interface automates the entire workflow, enabling quick, personalized riboswitch design.
What's the problem?
Designing reliable toehold switches for eukaryotic contexts is still bottlenecked by choosing the right target region. In polyclonal tumors there is no single, universally up-regulated transcript; expression shifts between subclones, time points and anatomical regions, so a hand-picked site ages quickly.
Our pipeline removes that guesswork: starting from patient reads, it quantifies expression across tumor/normal, selects robust tumor-only regions, and assembles testable switches automatically. The result is fewer subjective decisions, less trial-and-error, and a reproducible path from raw Nanopore data to ranked designs that a wet-lab can immediately validate.
In practice, this enables hospital teams to move from biopsy to patient-specific constructs within a single workflow — keeping data provenance intact, minimizing context switches between tools, and standardizing outputs for downstream cloning and transfection.
Implementation
The workflow is scripted and reproducible: options parsing → environment bootstrap → reference preparation → alignment/quantification → differential expression → export of tumor-only transcripts → riboswitch design.

On the first run the pipeline bootstraps its environment (packages, tools) and prepares the transcriptome reference (FASTA + indices). All heavy assets are cached in a stable folder layout and automatically reused on subsequent runs. Each stage is idempotent — you can re-invoke a later step without re-downloading or re-indexing, and partial outputs are detected safely.
Operational details include consistent run IDs for naming, deterministic seeds for steps using stochastic optimizers, and guardrails that fail fast on malformed inputs (e.g., mixed strandedness or mismatched read types).
Transcript selection
Input is long-read cDNA from tumor and matched normal (.fastq.gz
). The module prepares the transcriptome (plus an .mmi
index) on first run, aligns and quantifies expression, then computes differential expression across tumor regions to capture intratumoral heterogeneity.
Candidate transcripts are then filtered with guardrails that reduce false positives before design:
- Tumor enrichment: retain targets with strong fold-change and adequate base coverage across tumor regions.
- Paralog proximity: drop sequences with close paralogs or repeated segments likely to cause off-target triggering.
- Sequence sanity: ensure clean ORF context when a reporter will be appended downstream.
The output is a compact FASTA containing top tumor-specific transcripts and their metadata (IDs, coordinates, basic QC), which becomes the direct input to the designer.
Candidates generation
For each target, the designer splits the sequence into overlapping k-mers (sliding windows) and proposes complementary trigger regions. It then assembles full eukaryotic toeholds by adding the stem/loop, the Kozak motif (GCCACCAUGG
), start codon (AUG) and linkers — enforcing the reading frame and excluding premature stops.
Structural and interaction scoring proceeds in two passes: (1) iterative fitting toward the intended secondary structure (e.g., RNAinverse-style optimization) and (2) evaluation of RNA:RNA binding using interaction models (RNAup/RNAcofold). Heuristics include GC-content windows, hairpin stability limits, trigger accessibility, and penalties for sequence motifs known to hamper translation or splicing in mammalian cells.
The ranking function combines these terms into a single score, so you see a clear ordering of candidates. For robustness, multiple triggers can be combined into an OR gate within one construct; when enabled, the builder re-checks frame integrity and linker boundaries after concatenation to keep everything in-frame.
Output and user interaction
The designer exports ranked, fully assembled switches with per-candidate annotations (trigger coordinates, predicted structures, energy terms, constraint flags). Typical formats include FASTA (sequence-only), CSV (tabular scores) and JSON (machine-readable bundles for programmatic use).
Optional visualizations show predicted secondary structures and ensemble defects for quick QC; these artifacts align with the tabular rows so you can trace any candidate back to its inputs. Operator input stays minimal: point to the reads (plus an optional reporter FASTA), select threads and an output folder, and the pipeline handles the rest — reusing cached references for faster iteration.
Downstream, the top-ranked design per target can be merged into a single plasmid-ready construct. The builder validates junctions, Kozak placement and ORF continuity to avoid silent frame shifts.
Conducting Differential Expression
This module takes Nanopore sequencing reads (.fastq.gz
) from tumor and matched normal samples, runs alignment and quantification, performs differential expression, and outputs a FASTA file with the most tumor-specific transcripts for the next part of the pipeline (riboswitch design).
TL;DR — What does this section do, and how do I run it?
- Parse options & set up folders
- Environment bootstrap
- Reference preparation
- Run differential expression
- Select “only in tumor” transcripts
bash scripts/transcript_ranking.sh \ --tumor data/TUMOR.fastq.gz \ --normal data/NORMAL.fastq.gz \ --threads 16 \ --job-id MYRUN \ --pc-lnc-only 1
System requirements
- Linux or WSL2
- ~20 GB of disk space; 16 GB RAM recommended
- Internet access on first run (to install the environment and fetch the reference genome)
- Nothing else — Micromamba auto-configures the startup files needed to work (shell init hooks, environment root, etc.).
Inputs & core command
--tumor
: tumor reads (.fastq.gz
)--normal
: matched normal reads--threads
: CPU thread count--job-id
: run name (used to label outputs)--pc-lnc-only 1
: restrict to protein-coding + lncRNA (speeds up the run)
What happens under the hood
- Create/verify folders and bootstrap the Micromamba environment
- Prepare the reference (download/index on first run)
- Align reads and quantify expression
- Compute differential expression and rank transcripts
- Filter to “only in tumor” candidates
Outputs
- Alignment files (
.bam
) against the reference genome - Salmon results used for differential expression
- Tabular files listing significantly differentially expressed transcripts
- FASTA file with sequences of top tumor-specific transcripts (input for the riboswitch design block)
First run can be long
The very first execution on a given machine/sample set is slower because the reference genome must be downloaded and indexed. By default, the software detects existing reference assets and reuses them on subsequent runs.
Force a clean reference re-download/re-index
If you want to reset the reference and rebuild from scratch, remove the following files from the ref
directory:
rm -f ref/transcripts.mmi \ ref/transcripts.ref.sha256 \ ref/transcripts.fasta.path
Designing Riboswitches
This module takes tumor-specific transcripts from the previous step and automatically builds, scores, and exports eukaryotic toehold switches using
hepaswitch.py
and hepaswitch_script.py
.
Candidates are generated from overlapping k-mers, assembled with Kozak/AUG/linkers, structurally optimized, ranked, and exported for downstream testing.
TL;DR — What does this section do, and how do I run it?
- Read tumor-specific targets from FASTA
- Generate k-mer-based trigger regions
- Assemble full toehold constructs (Kozak, AUG, linkers)
- Optimize & score structures/interactions
- Rank and export best switches
python hepaswitch.py \ --targets targets.fasta \ --reporter optional_reporter.fasta \ --out out/
Inputs & core arguments
- targets.fasta — FASTA with top tumor-specific transcripts (output of the differential expression stage).
- optional_reporter.fasta — optional reporter ORF to append for assay readout.
- out/ — output directory for designed switches and scores.
What happens under the hood
- Partition each target transcript into overlapping k-mers to propose trigger regions.
- Build full toehold constructs by combining the trigger with structural elements:
stem/loop, Kozak
GCCACCAUGG
, start codon (AUG), and linkers; enforce in-frame coding and absence of premature stop codons. - Iterative structure fitting (e.g., RNAinverse) and interaction scoring (RNAup/RNAcofold) to evaluate RNA:RNA binding energies.
- Optional multi-trigger OR logic within one construct for robust detection.
Outputs
- Fully assembled toehold sequences (FASTA/CSV) with Kozak/AUG/linkers.
- Per-candidate structural annotations and energy scores for ranking.
- Optional visualizations of predicted secondary structures and ensemble defects for QC.
The best-scoring design for each target can be combined into a single, test-ready construct; the pipeline ensures all coding regions remain in-frame.
Key screens
1) Input — choose files & how many transcripts to use

- Transcripts are read in order; sequences are converted to RNA (T→U).
- Reporter FASTA (first record) is optional and passed through to the designer.
2) Design run — progress & per-transcript best

- Worker thread computes the best complex for each transcript via
find_best_complex
. - Best rows show the toehold (with linker) and AUG position; GUI updates progress as each finishes.
3) Final construct — OR fusion & export

- Multiple toeholds are concatenated with frame-safe trimming using
align_toeholds_for_OR
. - Save produces continuous-line FASTA (no wrapping).
- RNAplot preview is generated on demand (RNAfold → RNAplot → convert).
What happens under the hood
- k-mer triggers across 20–35 nt windows → reverse-complement → stem/loop + Kozak + AUG + linker.
- Structure fit (RNAinverse), then interaction scoring (RNAup + RNAcofold); scores are combined and ranked.
- GUI currently selects the single best candidate per transcript; the script API also exposes Top-N if needed.
Accepted inputs
- transcripts.fasta — 1..10 tumor-specific sequences (already chosen upstream)
- reporter.fasta — optional reporter ORF (first record is used)
Generated outputs
- Per-transcript best toehold (with linker) + AUG position (table view)
- Final OR construct (copy/save FASTA, continuous sequence)
- Optional RNAplot preview for the final construct
Dependencies
- PySide6 (GUI)
- ViennaRNA tools in
$PATH
: RNAfold, RNAcofold, RNAup, RNAinverse, RNAplot (checked at startup) - ImageMagick
convert
for PS→PNG (structure preview)
The GUI does not perform mapping/quantification or differential expression. Provide the already curated targets from the previous module (e.g., exported targets.fasta).
Demonstration
A short video demonstration showing how to clone the repository and perform riboswitch design using preselected sequences (FASTA) — without Nanopore data processing.
- Cloning the repository and quick environment setup,
- Running
hepaswitch.py
on an exampletargets.fasta
file, - Reviewing results (candidate ranking, OR construct preview),
- Exporting designed sequences to FASTA for downstream testing.
Note: this demonstration skips the mapping, quantification, and differential expression stages — it uses already selected transcripts.
Watch on YouTube ↗Citations
- Green AA, Silver PA, Collins JJ, Yin P. Toehold switches: de-novo-designed regulators of gene expression. Cell. 2014;159(4):925–939. doi:10.1016/j.cell.2014.10.002. PMC4265554
- Angenent-Mari NM, Garruss AS, Soenksen LR, Church G, Collins JJ. A deep learning approach to programmable RNA switches. Nature Communications. 2020;11:5057. doi:10.1038/s41467-020-18677-1. Article
- Computational design of toehold switches in eukaryotes and prokaryotes for efficient post-transcriptional control. bioRxiv. 2025. doi:10.1101/2025.01.15.633215. Preprint