Software — NOODL

NOODL


In synthetic biology, modular and structural design of DNA requires careful placement of non-functional spacer sequences to separate active regions. Poorly designed linkers can fold or hybridize with nearby sequences and compromise function—an issue for our design, where aptamer behavior is sensitive to unwanted domain interactions.

NOODL (Novel Optimization Of DNA Linkers) is a Julia-based genetic algorithm that generates single-stranded DNA (ssDNA) spacers that minimize self-folding and cross-hybridization with adjacent regions. Spacer sequences are non-functional DNA used to separate coding blocks; if a spacer folds or binds to flanks, it can alter downstream behavior.

We built NOODL to produce stable, non-interfering linkers by enforcing user parameters such as spacer length (e.g., 20 nt), target base composition (e.g., 40–60% AT), count of spacers to generate, and flanking/target sequences to avoid.

An Overview of Genetic Algorithms

What is a genetic algorithm? A genetic algorithm emulates a “survival of the fittest” system. A population of candidates are selected to begin with, each with one score based on how well it meets a certain fitness. Two parent candidates are selected where they go through recombination and mutation to generate a pool of offspring. This cycle is then repeated over several generations until a desired fitness score is achieved.

In NOODL, the candidates are DNA spacer sequences. They have a “successful” fitness score if they avoid reverse-complement matches and if they match a target AT%, meaning that they are less likely to fold or hybridize. Through the selection of parent sequences, recombination, and minor mutations of sequences, the algorithm evolves sequences that are increasingly unlikely to misfold or bind to flanking regions.

Why use a genetic algorithm? When the amount of possible solutions is too vast to explore and the rules are too soft to encode directly, evolutionary algorithms offer a way forward. It doesn't guarantee a perfect solution, but it guarantees improvement given time, variation, and selection. NOODL doesn't just generate sequences - it combines biological principles with computational algorithmic logic. In a sense it mimics how life solves problems: not by knowing the answer in advance but by trial and error.

We chose a genetic algorithm not just for its computational efficiency, but because it reflects the use of designing by iteration, constraints, and diversity. Nature doesn't create perfect solutions at once - it searches, mutates, recombines, and selects. NOODL has the same concept, where it uses an evolutionary process to discover linkers that meet multiple constraints by evolving its way toward optimal solutions. When writing a genetic algorithm we are creating the universe in microcosm. We establish a set of fundamental rules for the nature of inheritance and survival. Within this digital version of our world, we generate something that manifests in a successful lineage, honed by several generations of selection.


README

NOODL is a command-line interface (CLI) tool for spacer design and a small set of importable Julia modules that implement scoring, selection, and sequence variation. The NOODL project runs with Julia. Make sure you have Julia 1.11.6+ installed, and download all NOODL files from our gitlab repository. While in the same directory as the downloaded NOODL files, run julia noodl.jl in Terminal.

Requirements: Julia 1.11.6+.

Get the code:

git clone https://gitlab.igem.org/2025/software-tools/ucsc.git
cd noodl

Run (CLI):

julia noodl.jl --help

Folder layout (key files):

.
├─ noodl.jl                 # CLI entry
├─ RCScore.jl               # scoring (RC & AT%)
├─ Flanks.jl                # flank RC checks
├─ BleedingFlanks.jl        # junction/target bleed checks
├─ crossover/               # crossover strategies
│  ├─ SinglePointCrossover.jl
│  ├─ UniformCrossover.jl
│  └─ MultiPointCrossover.jl
└─ bias/                    # selection/bias strategies
   ├─ StochasticUniversalSampling.jl
   ├─ RouletteWheelSelection.jl
   └─ TournamentBias.jl

Find the best 20-nt spacer at 55% AT while avoiding provided flanks; evolve 1000 candidates for 250 generations:

julia noodl.jl --spacer_len 20 --target_at 0.55 --population 1000 --generations 250 \
  --left_flank aptamer1.fa --right_flank aptamer2.fa --verbose true

All options:

noodl.jl [-l SPACER_LEN] [-a TARGET_AT] [-p POPULATION]
   [-g GENERATIONS] [-m MUTATION_RATE]
   [-u MUTATIONS_PER_OFFSPRING] [-x CROSSOVER] [-b BIAS]
   [--left_flank LEFT_FLANK] [--right_flank RIGHT_FLANK]
   [-t [TARGETS...]] [--kmers KMERS] [-s SEED]
   [-v VERBOSE] [-h]
-l, --spacer_len INT            length of each spacer (default: 50)
-a, --target_at FLOAT           target AT content (0–1) (default: 0.55)
-p, --population INT            population size (default: 200)
-g, --generations INT           number of generations (default: 300)
-m, --mutation_rate FLOAT       per-offspring mutation chance (default: 0.2)
-u, --mutations_per_offspring INT   number of base mutations when mutating (default: 2)
-x, --crossover {single|uniform|multi}  (default: multi)
-b, --bias {stochastic|tournament|roulette} (default: tournament)
--left_flank FILE               optional FASTA file for left flank
--right_flank FILE              optional FASTA file for right flank
-t, --targets [FILES...]        FASTA files treated as independent targets (added to flanks)
--kmers  "6,7,8"                comma-separated k values for internal + bleed scoring (default: "7,8")
-s, --seed INT                  random seed (optional)
-v, --verbose {true|false}      print progress logs (default: false)
-h, --help                      show help and exit

See the full NOODL Pipeline (Command Line Interface) section below for more information.

The NOODL team is open to receiving feedback, suggestions, and contributions from the community.

Contributors

NOODL was developed by the NOODL team, dry lab members of UCSC iGEM 2025 (safeTEA):

  • Anisha Jaiswal
  • Vaishnavi Venuturimilli
  • Srishta Hazra
  • Anavi Deshmukh

Special thanks:

David Bernick, our PI. An amazing mentor and collaborator who provided guidance on GA design and validation workflows.

NOODL Flow (at a glance)

User logs into NOODL repo
CLI parameters
Final best spacer report
Program behind the scenes
FASTA flanks format
Install Julia

NOODL Pipeline (Command Line Interface)

NOODL provides the following options:

noodl.jl [-l SPACER_LEN] [-a TARGET_AT] [-p POPULATION]
   [-g GENERATIONS] [-m MUTATION_RATE]
   [-u MUTATIONS_PER_OFFSPRING] [-x CROSSOVER] [-b BIAS]
   [--left_flank LEFT_FLANK] [--right_flank RIGHT_FLANK]
   [-t [TARGETS...]] [--kmers KMERS] [-s SEED]
   [-v VERBOSE] [-h]

optional arguments:
-l, --spacer_len SPACER_LEN
      length of each spacer (type: Int64, default: 50)
-a, --target_at TARGET_AT
      target AT content (0–1) (type: Float64, default: 0.55)
-p, --population POPULATION
      population size (type: Int64, default: 200)
-g, --generations GENERATIONS
      number of generations (type: Int64, default: 300)
-m, --mutation_rate MUTATION_RATE
      per-offspring mutation chance (0–1) (type: Float64, default: 0.2)
-u, --mutations_per_offspring MUTATIONS_PER_OFFSPRING
      number of base mutations when mutating (type: Int64, default: 2)
-x, --crossover CROSSOVER
      crossover method: single | uniform | multi (default: "multi")
-b, --bias BIAS
      bias method: stochastic | tournament | roulette (default: "tournament")
--left_flank LEFT_FLANK
      optional FASTA file for left flank (default: "")
--right_flank RIGHT_FLANK
      optional FASTA file for right flank (default: "")
-t, --targets [TARGETS...]
      FASTA files treated as independent targets (added to any flanks)
--kmers KMERS
      comma-separated k values used for internal + bleed scoring (e.g. 6,7,8) (default: "7,8")
-s, --seed SEED       random seed (optional) (type: Int64)
-v, --verbose VERBOSE print progress logs (type: Bool, default: false)
-h, --help            show this help message and exit

Usage Example

Find the best 20-nt spacer at 55% AT while avoiding provided flanks; evolve 1000 candidates for 250 generations:

julia noodl.jl --spacer_len 20 --target_at 0.55 --population 1000 --generations 250 \
  --left_flank aptamer1.fa --right_flank aptamer2.fa --verbose true

NOODL Modules Documentation

noodl.jl

noodl.jl runs NOODL, a genetic-algorithm tool that evolves a DNA spacer and picks the single best one that’s least likely to fold on itself or stick to your flanking/target sequences. It provides both a command-line app and library functions: you give it spacer length, target A/T, and optional flanks/targets, and it returns the optimized spacer with a short score report.

mutate_sequence

Randomly flips a few letters (A/T/C/G) in the DNA sequence you give it. It edits the original sequence directly.

random_spacer

Makes a brand-new random DNA spacer of the length you want, with roughly the A+T percentage you asked for.

read_fasta_as_vec

Opens a FASTA file and grabs the first DNA sequence in it, returning the letters as a list like ["A","T","G",…].

_parse_int_list

Turns a string like "7,8" into the numbers [7, 8]. If it’s empty, you get an empty list.

_flanks_vec

Packs the optional left and right flanking sequences into one list, skipping any that weren’t provided.

run_noodl

Runs the whole “evolution” process to search for a good spacer and returns the single best one it finds. If you gave flanking/target sequences, it also avoids spacers that could stick to those.

parse_args

Reads the command-line options a user typed (like length, A/T, files, etc.) and turns them into values the program can use.

main

The full command-line workflow: read options and input files, build the config, run NOODL, and print the final best spacer and a short score report.

RCScore.jl

RCScore.jl is the scoring component NOODL uses to judge DNA spacers. It checks how likely a sequence is to stick to itself by counting reverse-complement kmer (groups of bases) matches and how close its A/T% is to a target. It returns a single lower-is-better score and includes simple helpers (reverse complement, AT% calculator).

reverse_complement(seq)

Returns the DNA reverse complement of seq (A↔T, C↔G, reversed order). Handy for spotting potential self-pairing.

count_reverse_complement_kmers(sequence, k)

Counts how many k-length words in the sequence have their reverse complement also present somewhere in the sequence (palindromic kmers aren’t double-counted). More matches = more chances to form stems/hairpins.

at_content(sequence)

Computes the fraction of A and T bases in the sequence (a number between 0 and 1).

calculate_score(sequence, target_at_content; kmer_lengths=[7,8], verbose=false)

Gives the sequence a single score = (number of reverse-complement k-mer matches across the given k values) + 10×(difference between actual AT% and the target). Lower is better (fewer RC matches and AT% closer to target); verbose=true prints a short breakdown.

Flanks.jl

Flanks.jl checks whether a candidate spacer could stick to its neighboring sequences (flanks) by looking for reverse-complement word matches, and provides a quick way to penalize such clashes in a score.

_kmer_multiset(seq, k)

Breaks a sequence into all overlapping k-length “words” (kmers) and counts how many times each one appears. Used internally to compare spacer words to flank words.

cross_rc_hits(spacer, flanks; kmer_lengths=[7,8]) → (total_hits, detail)

Counts how many kmers in the spacer have their reverse complement present in any flank (i.e., potential binding seeds). Returns the total number of hits and a per-k list of matching (kmer, rc) pairs.

passes_flank_check(spacer, flanks; kmer_lengths=[7,8], max_hits_allowed=0)

Quick yes/no gate: returns true if the spacer has at most max_hits_allowed reverse-complement matches into the flanks (default 0), otherwise false.

flank_penalized_score(spacer, target_at, flanks; kmer_lengths=[7,8], flank_penalty=1e6, verbose=false)

Convenience scorer: computes the spacer’s base score (internal RC + AT penalty from RCScore) and then adds a big penalty if any spacer ⇔ flank reverse-complement hits are found. Returns (total score, base score, number of cross hits).

BleedingFlanks.jl

BleedingFlanks.jl checks whether a spacer might stick to other sequences (targets like aptamers/flanks) not only by direct matches, but also across the junctions where the spacer meets its left/right neighbors (“bleed”). It can count these risky matches, gate them with simple pass/fail rules, fold them into a score, and verify fixed start/end bases.

_kmer_multiset(seq, k)

Breaks a sequence into all overlapping kmers and counts how often each shows up.

_bleed_kmers(spacer, k; prefix_km1=nothing, suffix_km1=nothing)

Builds all kmers that straddle the junctions (left: end of left + start of spacer; right: end of spacer + start of right).

precompute_target_kmers(targets, kmer_lengths)

Pre-counts k-mers per target for speed.

independent_hits_with_bleed(spacer, targets; kmer_lengths=[7,8], prefix_km1=nothing, suffix_km1=nothing, precomputed_targets=nothing)

Counts spacer and bleed-junction k-mers whose RCs appear in each target; returns totals & per-target breakdown.

passes_target_set_check_with_bleed(spacer, targets; mode=:union, …, union_max_allowed=0, per_target_max_allowed=0)

True if hits are within your limits by total, per-target, or both.

independent_penalized_score_with_bleed(spacer, target_at, targets; …, mode=:union, hard_penalty=1e6, use_soft=false, …)

Combines internal score (RC/AT) with bleed penalties (hard or soft).

check_fixed_ends(spacer; fixed_prefix=nothing, fixed_suffix=nothing)

Verifies exact required start/end bases; reports mismatches.

Crossover Files

SinglePointCrossover.jl

Implements single-point crossover: one random cut, swap tails to create two children.

single_point_crossover(parent1::String, parent2::String) → (child1::String, child2::String)

Random cut 1..L-1; split and swap right halves.

UniformCrossover.jl

Uniform per-position coin flip (no fixed cut site).

uniform_crossover(parent1::String, parent2::String) → (child1::String, child2::String)

Per position, heads: child1←p1(…); tails: swap; repeat across length.

MultiPointCrossover.jl

Multiple cuts, alternate segments to produce two children.

custom_multi_crossover(parent1::String, parent2::String)

Pick k cuts; alternate segment assignment between parents to form children.

Bias Selection Files

StochasticUniversalSampling.jl

SUS parent selection with evenly spaced pointers (more even than roulette).

stochastic_selection(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)

Score with RCScore.calculate_score, convert to fitness (max_score - score), select k via SUS; guard for all-zero fitness.

RouletteWheelSelection.jl

Roulette selection without replacement.

roulette_wheel_selection(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)

Weighted spins pick k; remove each pick; guard for all-zero fitness.

TournamentBias.jl

Tournament selection: sample k, take best (lowest score), repeat for two distinct parents.

tournament_bias(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)

Run small tournaments scored by RCScore.calculate_score; ensure different parents.

Bleeding Junctions Explanation

A problem we faced was that spacers can still hybridize at the junctions with neighboring parts (left/right flanks or aptamers). NOODL’s standard checks could miss these boundary-spanning kmers. To fix this, we explicitly built a module that implements boundary (“bleed”) kmer checks that straddle left–spacer and spacer–right junctions and count reverse complement matches in targets, then penalize offending spacers.

We compare spacer kmers against merged flank/target kmers, count reverse complement hits, and then pass/fail or penalize any spacer that shows cross-binding risk so the genetic algorithm steers away from those designs.

Validation with Geneious[1]

What we need to check/validate:

  • Spacer remains largely unstructured (doesn’t form stable hairpins).
  • No long stems across the spacer–aptamer junctions.
  • P6 and P4G03 aptamer cores are preserved and remain separate.
  • Probability peaks on aptamer stems; spacer/junctions stay low-probability.

To confirm that NOODL-generated spacers do not form unintended secondary structures or hybridize with the aptamers, we validated some sequences using Geneious’ DNA Folding tool. The exported spacers were concatenated with the aptamers, folded, and the resulting plots were visually inspected for problematic structures.

The folding prediction allows us to assess whether the spacer maintains proper structural separation between the two aptamers. Ideally, minimal hybridization should occur between the aptamers, confirming that the spacer does not introduce unintended base-pairing or interfere with the aptamers’ secondary structures. This modeling step provides a visual verification that the NOODL-generated spacer achieves its design goal of preventing cross-structure formation while maintaining stable, independent folding of both aptamer domains.

P6
P6
P4G03
P4G03

We ran NOODL with the following parameters:

julia noodl.jl --spacer_len 40 --population 300 --mutation_rate 0.2 \
  --left_flank p6aptamer.fa --right_flank p4aptamer.fa \
  --targets p6aptamer.fa p4aptamer.fa --kmers 3,5,7 --verbose true

This run asked NOODL to return the single best 40-nt spacer while accounting for both the P6 and P4GG03 aptamers (--targets) and the provided flanks (--left_flank/--right_flank). With a population of 300 and a mutation rate of 0.2, the GA searched for a spacer that minimizes self-folding and avoids hybridization to the aptamers and junctions. We ran this NOODL command a few times to obtain a few spacers to plot on Geneious.

Then, on Geneious, we did the following:
Concatenate P6 + spacer and fold
Use Geneious DNA Folding on the P6+spacer construct. Confirm the spacer region is mostly unpaired and does not form stable hairpins. Scan the P6–spacer junction for long, well-paired stems that could create cross-hybridization.
Open probability plot
Review base-pairing probability. Expect high confidence along P6 helices and low probability across the spacer and junctions. Look for localized hotspots that might indicate emerging off-target pairing.
Concatenate P6 + spacer + P4G03
Fold the full construct and verify that both aptamer cores retain their structures. Confirm the spacer maintains separation between P6 and P4G03, with no chimeric stems interacting with both aptamers.
Compare temps and kmers
Toggle 20/37/55 °C and k=3/5/7. Expect structures to relax with increasing temperature, while larger kmer filters suppress incidental stems. Record whether spacer structure and aptamer preservation are consistent across conditions; highlight any temperature- or k-specific vulnerabilities.

Below are three example spacers generated by NOODL at different temperatures (20°C, 37°C, and 50°C). Each card shows the spacer’s folding prediction (top) and the probability plot (bottom) indicating the likelihood of each base being paired. The k-mer buttons allow you to toggle the k-mer size used in NOODL’s scoring (3, 5, or 7). These examples show how NOODL effectively designs spacers that minimize secondary structure formation and avoid hybridization with the aptamers across a range of conditions.

20 °C


Color by strand red = P6 concatenated with spacer; green = P4G03
strand plot

Color by probability red = high confidence of base pair formation; blue = low confidence of base pair formation
probability plot

37 °C


Color by strand red = P6 concatenated with spacer; green = P4G03
strand plot

Color by probability red = high confidence of base pair formation; blue = low confidence of base pair formation
probability plot

55 °C


Color by strand red = P6 concatenated with spacer; green = P4G03
strand plot

Color by probability red = high confidence of base pair formation; blue = low confidence of base pair formation
probability plot

  1. Geneious Prime 2025.0. (https://www.geneious.com).
  2. “Genetic Algorithms Tutorial.” Tutorialspoint, https://www.tutorialspoint.com/genetic_algorithms/index.htm .