In synthetic biology, modular and structural design of DNA requires careful placement of non-functional spacer sequences to separate active regions. Poorly designed linkers can fold or hybridize with nearby sequences and compromise function—an issue for our design, where aptamer behavior is sensitive to unwanted domain interactions.
NOODL (Novel Optimization Of DNA Linkers) is a Julia-based genetic algorithm that generates single-stranded DNA (ssDNA) spacers that minimize self-folding and cross-hybridization with adjacent regions. Spacer sequences are non-functional DNA used to separate coding blocks; if a spacer folds or binds to flanks, it can alter downstream behavior.
We built NOODL to produce stable, non-interfering linkers by enforcing user parameters such as spacer length (e.g., 20 nt), target base composition (e.g., 40–60% AT), count of spacers to generate, and flanking/target sequences to avoid.
What is a genetic algorithm? A genetic algorithm emulates a “survival of the fittest” system. A population of candidates are selected to begin with, each with one score based on how well it meets a certain fitness. Two parent candidates are selected where they go through recombination and mutation to generate a pool of offspring. This cycle is then repeated over several generations until a desired fitness score is achieved.
In NOODL, the candidates are DNA spacer sequences. They have a “successful” fitness score if they avoid reverse-complement matches and if they match a target AT%, meaning that they are less likely to fold or hybridize. Through the selection of parent sequences, recombination, and minor mutations of sequences, the algorithm evolves sequences that are increasingly unlikely to misfold or bind to flanking regions.
Why use a genetic algorithm? When the amount of possible solutions is too vast to explore and the rules are too soft to encode directly, evolutionary algorithms offer a way forward. It doesn't guarantee a perfect solution, but it guarantees improvement given time, variation, and selection. NOODL doesn't just generate sequences - it combines biological principles with computational algorithmic logic. In a sense it mimics how life solves problems: not by knowing the answer in advance but by trial and error.
We chose a genetic algorithm not just for its computational efficiency, but because it reflects the use of designing by iteration, constraints, and diversity. Nature doesn't create perfect solutions at once - it searches, mutates, recombines, and selects. NOODL has the same concept, where it uses an evolutionary process to discover linkers that meet multiple constraints by evolving its way toward optimal solutions. When writing a genetic algorithm we are creating the universe in microcosm. We establish a set of fundamental rules for the nature of inheritance and survival. Within this digital version of our world, we generate something that manifests in a successful lineage, honed by several generations of selection.
NOODL is a command-line interface (CLI) tool for spacer design and a small set of importable Julia modules that implement scoring, selection, and sequence variation. The NOODL project runs with Julia. Make sure you have Julia 1.11.6+ installed, and download all NOODL files from our gitlab repository. While in the same directory as the downloaded NOODL files, run julia noodl.jl in Terminal.
Requirements: Julia 1.11.6+.
Get the code:
git clone https://gitlab.igem.org/2025/software-tools/ucsc.git cd noodl
Run (CLI):
julia noodl.jl --help
Folder layout (key files):
. ├─ noodl.jl # CLI entry ├─ RCScore.jl # scoring (RC & AT%) ├─ Flanks.jl # flank RC checks ├─ BleedingFlanks.jl # junction/target bleed checks ├─ crossover/ # crossover strategies │ ├─ SinglePointCrossover.jl │ ├─ UniformCrossover.jl │ └─ MultiPointCrossover.jl └─ bias/ # selection/bias strategies ├─ StochasticUniversalSampling.jl ├─ RouletteWheelSelection.jl └─ TournamentBias.jl
Find the best 20-nt spacer at 55% AT while avoiding provided flanks; evolve 1000 candidates for 250 generations:
julia noodl.jl --spacer_len 20 --target_at 0.55 --population 1000 --generations 250 \ --left_flank aptamer1.fa --right_flank aptamer2.fa --verbose true
All options:
noodl.jl [-l SPACER_LEN] [-a TARGET_AT] [-p POPULATION] [-g GENERATIONS] [-m MUTATION_RATE] [-u MUTATIONS_PER_OFFSPRING] [-x CROSSOVER] [-b BIAS] [--left_flank LEFT_FLANK] [--right_flank RIGHT_FLANK] [-t [TARGETS...]] [--kmers KMERS] [-s SEED] [-v VERBOSE] [-h]
-l, --spacer_len INT length of each spacer (default: 50) -a, --target_at FLOAT target AT content (0–1) (default: 0.55) -p, --population INT population size (default: 200) -g, --generations INT number of generations (default: 300) -m, --mutation_rate FLOAT per-offspring mutation chance (default: 0.2) -u, --mutations_per_offspring INT number of base mutations when mutating (default: 2) -x, --crossover {single|uniform|multi} (default: multi) -b, --bias {stochastic|tournament|roulette} (default: tournament) --left_flank FILE optional FASTA file for left flank --right_flank FILE optional FASTA file for right flank -t, --targets [FILES...] FASTA files treated as independent targets (added to flanks) --kmers "6,7,8" comma-separated k values for internal + bleed scoring (default: "7,8") -s, --seed INT random seed (optional) -v, --verbose {true|false} print progress logs (default: false) -h, --help show help and exit
See the full NOODL Pipeline (Command Line Interface) section below for more information.
The NOODL team is open to receiving feedback, suggestions, and contributions from the community.
Contributors
NOODL was developed by the NOODL team, dry lab members of UCSC iGEM 2025 (safeTEA):
Special thanks:
David Bernick, our PI. An amazing mentor and collaborator who provided guidance on GA design and validation workflows.
NOODL provides the following options:
noodl.jl [-l SPACER_LEN] [-a TARGET_AT] [-p POPULATION] [-g GENERATIONS] [-m MUTATION_RATE] [-u MUTATIONS_PER_OFFSPRING] [-x CROSSOVER] [-b BIAS] [--left_flank LEFT_FLANK] [--right_flank RIGHT_FLANK] [-t [TARGETS...]] [--kmers KMERS] [-s SEED] [-v VERBOSE] [-h] optional arguments: -l, --spacer_len SPACER_LEN length of each spacer (type: Int64, default: 50) -a, --target_at TARGET_AT target AT content (0–1) (type: Float64, default: 0.55) -p, --population POPULATION population size (type: Int64, default: 200) -g, --generations GENERATIONS number of generations (type: Int64, default: 300) -m, --mutation_rate MUTATION_RATE per-offspring mutation chance (0–1) (type: Float64, default: 0.2) -u, --mutations_per_offspring MUTATIONS_PER_OFFSPRING number of base mutations when mutating (type: Int64, default: 2) -x, --crossover CROSSOVER crossover method: single | uniform | multi (default: "multi") -b, --bias BIAS bias method: stochastic | tournament | roulette (default: "tournament") --left_flank LEFT_FLANK optional FASTA file for left flank (default: "") --right_flank RIGHT_FLANK optional FASTA file for right flank (default: "") -t, --targets [TARGETS...] FASTA files treated as independent targets (added to any flanks) --kmers KMERS comma-separated k values used for internal + bleed scoring (e.g. 6,7,8) (default: "7,8") -s, --seed SEED random seed (optional) (type: Int64) -v, --verbose VERBOSE print progress logs (type: Bool, default: false) -h, --help show this help message and exit
Usage Example
Find the best 20-nt spacer at 55% AT while avoiding provided flanks; evolve 1000 candidates for 250 generations:
julia noodl.jl --spacer_len 20 --target_at 0.55 --population 1000 --generations 250 \ --left_flank aptamer1.fa --right_flank aptamer2.fa --verbose true
noodl.jl
noodl.jl runs NOODL, a genetic-algorithm tool that evolves a DNA spacer and picks the single best one that’s least likely to fold on itself or stick to your flanking/target sequences. It provides both a command-line app and library functions: you give it spacer length, target A/T, and optional flanks/targets, and it returns the optimized spacer with a short score report.
mutate_sequence
Randomly flips a few letters (A/T/C/G) in the DNA sequence you give it. It edits the original sequence directly.
random_spacer
Makes a brand-new random DNA spacer of the length you want, with roughly the A+T percentage you asked for.
read_fasta_as_vec
Opens a FASTA file and grabs the first DNA sequence in it, returning the letters as a list like ["A","T","G",…].
_parse_int_list
Turns a string like "7,8" into the numbers [7, 8]. If it’s empty, you get an empty list.
_flanks_vec
Packs the optional left and right flanking sequences into one list, skipping any that weren’t provided.
run_noodl
Runs the whole “evolution” process to search for a good spacer and returns the single best one it finds. If you gave flanking/target sequences, it also avoids spacers that could stick to those.
parse_args
Reads the command-line options a user typed (like length, A/T, files, etc.) and turns them into values the program can use.
main
The full command-line workflow: read options and input files, build the config, run NOODL, and print the final best spacer and a short score report.
RCScore.jl
RCScore.jl is the scoring component NOODL uses to judge DNA spacers. It checks how likely a sequence is to stick to itself by counting reverse-complement kmer (groups of bases) matches and how close its A/T% is to a target. It returns a single lower-is-better score and includes simple helpers (reverse complement, AT% calculator).
reverse_complement(seq)
Returns the DNA reverse complement of seq (A↔T, C↔G, reversed order). Handy for spotting potential self-pairing.
count_reverse_complement_kmers(sequence, k)
Counts how many k-length words in the sequence have their reverse complement also present somewhere in the sequence (palindromic kmers aren’t double-counted). More matches = more chances to form stems/hairpins.
at_content(sequence)
Computes the fraction of A and T bases in the sequence (a number between 0 and 1).
calculate_score(sequence, target_at_content; kmer_lengths=[7,8], verbose=false)
Gives the sequence a single score = (number of reverse-complement k-mer matches across the given k values) + 10×(difference between actual AT% and the target). Lower is better (fewer RC matches and AT% closer to target); verbose=true prints a short breakdown.
Flanks.jl
Flanks.jl checks whether a candidate spacer could stick to its neighboring sequences (flanks) by looking for reverse-complement word matches, and provides a quick way to penalize such clashes in a score.
_kmer_multiset(seq, k)
Breaks a sequence into all overlapping k-length “words” (kmers) and counts how many times each one appears. Used internally to compare spacer words to flank words.
cross_rc_hits(spacer, flanks; kmer_lengths=[7,8]) → (total_hits, detail)
Counts how many kmers in the spacer have their reverse complement present in any flank (i.e., potential binding seeds). Returns the total number of hits and a per-k list of matching (kmer, rc) pairs.
passes_flank_check(spacer, flanks; kmer_lengths=[7,8], max_hits_allowed=0)
Quick yes/no gate: returns true if the spacer has at most max_hits_allowed reverse-complement matches into the flanks (default 0), otherwise false.
flank_penalized_score(spacer, target_at, flanks; kmer_lengths=[7,8], flank_penalty=1e6, verbose=false)
Convenience scorer: computes the spacer’s base score (internal RC + AT penalty from RCScore) and then adds a big penalty if any spacer ⇔ flank reverse-complement hits are found. Returns (total score, base score, number of cross hits).
BleedingFlanks.jl
BleedingFlanks.jl checks whether a spacer might stick to other sequences (targets like aptamers/flanks) not only by direct matches, but also across the junctions where the spacer meets its left/right neighbors (“bleed”). It can count these risky matches, gate them with simple pass/fail rules, fold them into a score, and verify fixed start/end bases.
_kmer_multiset(seq, k)
Breaks a sequence into all overlapping kmers and counts how often each shows up.
_bleed_kmers(spacer, k; prefix_km1=nothing, suffix_km1=nothing)
Builds all kmers that straddle the junctions (left: end of left + start of spacer; right: end of spacer + start of right).
precompute_target_kmers(targets, kmer_lengths)
Pre-counts k-mers per target for speed.
independent_hits_with_bleed(spacer, targets; kmer_lengths=[7,8], prefix_km1=nothing, suffix_km1=nothing, precomputed_targets=nothing)
Counts spacer and bleed-junction k-mers whose RCs appear in each target; returns totals & per-target breakdown.
passes_target_set_check_with_bleed(spacer, targets; mode=:union, …, union_max_allowed=0, per_target_max_allowed=0)
True if hits are within your limits by total, per-target, or both.
independent_penalized_score_with_bleed(spacer, target_at, targets; …, mode=:union, hard_penalty=1e6, use_soft=false, …)
Combines internal score (RC/AT) with bleed penalties (hard or soft).
check_fixed_ends(spacer; fixed_prefix=nothing, fixed_suffix=nothing)
Verifies exact required start/end bases; reports mismatches.
Crossover
FilesSinglePointCrossover.jl
Implements single-point crossover: one random cut, swap tails to create two children.
single_point_crossover(parent1::String, parent2::String) → (child1::String, child2::String)
Random cut 1..L-1; split and swap right halves.
UniformCrossover.jl
Uniform per-position coin flip (no fixed cut site).
uniform_crossover(parent1::String, parent2::String) → (child1::String, child2::String)
Per position, heads: child1←p1(…); tails: swap; repeat across length.
MultiPointCrossover.jl
Multiple cuts, alternate segments to produce two children.
custom_multi_crossover(parent1::String, parent2::String)
Pick k cuts; alternate segment assignment between parents to form children.
Bias Selection
FilesStochasticUniversalSampling.jl
SUS parent selection with evenly spaced pointers (more even than roulette).
stochastic_selection(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)
Score with RCScore.calculate_score, convert to fitness (max_score - score), select k via SUS; guard for all-zero fitness.
RouletteWheelSelection.jl
Roulette selection without replacement.
roulette_wheel_selection(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)
Weighted spins pick k; remove each pick; guard for all-zero fitness.
TournamentBias.jl
Tournament selection: sample k, take best (lowest score), repeat for two distinct parents.
tournament_bias(population::Vector{Vector{String}}, k::Int, target_at_content::Float64)
Run small tournaments scored by RCScore.calculate_score; ensure different parents.
A problem we faced was that spacers can still hybridize at the junctions with neighboring parts (left/right flanks or aptamers). NOODL’s standard checks could miss these boundary-spanning kmers. To fix this, we explicitly built a module that implements boundary (“bleed”) kmer checks that straddle left–spacer and spacer–right junctions and count reverse complement matches in targets, then penalize offending spacers.
We compare spacer kmers against merged flank/target kmers, count reverse complement hits, and then pass/fail or penalize any spacer that shows cross-binding risk so the genetic algorithm steers away from those designs.
To confirm that NOODL-generated spacers do not form unintended secondary structures or hybridize with the aptamers, we validated some sequences using Geneious’ DNA Folding tool. The exported spacers were concatenated with the aptamers, folded, and the resulting plots were visually inspected for problematic structures.
The folding prediction allows us to assess whether the spacer maintains proper structural separation between the two aptamers. Ideally, minimal hybridization should occur between the aptamers, confirming that the spacer does not introduce unintended base-pairing or interfere with the aptamers’ secondary structures. This modeling step provides a visual verification that the NOODL-generated spacer achieves its design goal of preventing cross-structure formation while maintaining stable, independent folding of both aptamer domains.
julia noodl.jl --spacer_len 40 --population 300 --mutation_rate 0.2 \ --left_flank p6aptamer.fa --right_flank p4aptamer.fa \ --targets p6aptamer.fa p4aptamer.fa --kmers 3,5,7 --verbose true
This run asked NOODL to return the single best 40-nt spacer while accounting for both the P6 and P4GG03
aptamers (--targets
) and the provided flanks (--left_flank
/--right_flank
).
With a population of 300 and a mutation rate of 0.2, the GA searched for a spacer that minimizes self-folding and avoids
hybridization to the aptamers and junctions. We ran this NOODL command a few times to obtain a few spacers to plot
on Geneious.
Below are three example spacers generated by NOODL at different temperatures (20°C, 37°C, and 50°C). Each card shows the spacer’s folding prediction (top) and the probability plot (bottom) indicating the likelihood of each base being paired. The k-mer buttons allow you to toggle the k-mer size used in NOODL’s scoring (3, 5, or 7). These examples show how NOODL effectively designs spacers that minimize secondary structure formation and avoid hybridization with the aptamers across a range of conditions.