Search
The search step determines how TADPOLE explores the RNA sequence space to find sequences that satisfy the desired folding and switching criteria. There are two main strategies: Brute Force (BF) and Genetic Algorithm (GA).
Transparency Note
Current Implementation
As currently implemented, the search is not executed in parallel, but both search strategies could in principle run concurrently. In all cases, the individual evaluations are independent of each other, which means the brute force approach could be parallelized except for the final step of selecting the best candidate.
Similarly, in the genetic algorithm, the fitness evaluations within a generation can also be carried out independently.
Therefore, while the process may appear slow, in theory we could achieve roughly a 5× speedup without re-implementing it in another language, assuming ~8 threads on modern CPUs. For the genetic algorithm, we could also scale the population size by this factor and expect better results in the same runtime.
Brute Force (BF)
The brute force method guarantees that all possible solutions are explored, but it has important limitations regarding time and scalability.
Workflow of Brute Force
1. Linker generation: All specified linker lengths are iterated, and all possible nucleotide combinations (A, U, G, C) are generated.
2. Evaluation: The linker is concatenated with RNA1 and RNA3 to form the complete sequence.
3. Prediction: The structure and Minimum Free Energy (MFE) are predicted.
4. Filtering: Designs are discarded if they fail the established criteria.
Workflow of the Brute Force algorithm. 💡
Linker generation
All specified linker lengths are iterated, and all possible nucleotide combinations (A, U, G, C) are generated.
Evaluation of each linker
- The linker is concatenated with RNA1 and RNA3 to form the complete sequence
- The structure and Minimum Free Energy (MFE) are predicted
- Designs are discarded if they fail the following criteria:
OFF-State (without ligand)
• Number of base pairs between RNA1 and RNA3 ≤ Maximum number of FRE-CRE pairings
• Watched positions in FRE must change state to construct the OFF conformation
ON-State (with ligand)
• The FRE substructure must remain functional, allowing only a limited number of changes
• Maximum changes on the FRE structure
Delta MFE
• The difference between ON and OFF MFE must exceed Minimum Energy difference (kcal/mol)
• Fundamental energetic criterion for switching
Optional FRE mutations
If "allow mutations in FRE" is checked, mutations in FRE are tried for linkers that fail the delta MFE, and the criteria are re-evaluated.
Limitations and When to Use BF
Usage Recommendations
Recommended for short linkers (up to 6–7 nucleotides), where the search space is manageable.
Complexity grows exponentially with linker length or RNA1 mutations:
- Length 6 → 4^6 = 4,096 combinations
- Two FRE mutations → 4^6 × 3^2 = 36,864 combinations
For longer linkers or systems requiring mutations, BF becomes computationally infeasible.
Genetic Algorithm (GA)
The GA is a more efficient method for exploring large, complex search spaces without evaluating every possible combination. It is inspired by biological evolution[7] and optimizes solutions over multiple generations.
Key Concepts
Population
a set of potential solutions (individuals or chromosomes)
Genes
features of each solution (e.g., nucleotides in the linker)
Fitness function
scores how close each individual is to the desired solution
Evolutionary cycle
evaluation → selection → recombination (crossover) → mutation → new generation
Workflow of GA for Linker Search
Workflow of the Genetic Algorithm.💡
A population of random linkers of fixed length is generated.
Each linker is evaluated using the same criteria as BF: OFF-state, ON-state, delta MFE, and optional FRE mutations.
Linkers failing criteria receive a very low fitness score.
The best linkers are chosen to reproduce the next generation.
Parts of two parent linkers are combined to produce new child linkers.
Nucleotides are randomly altered in the linker, and the allowed positions on the FRE to maintain diversity and avoid local optima.
Steps 2–5 are repeated until stopping criteria are met (e.g., target fitness, max generations, or convergence).
Linker sequences, ON/OFF structures, MFE, mutations, fitness scores, and clustering of structures are saved.
When to Use GA
Recommended Usage
Recommended for long linkers (>7 nucleotides) or when FRE mutations are needed.
Efficiently explores large search spaces, finding diverse solutions without exhaustive enumeration.
The complexity of GA can be controlled through parameters such as population size and number of cycles, and in principle should grow linearly with them. A useful way to visualize its behavior is to plot how the fitness (average or minimum) evolves across cycles—this information is already tracked in the code.
While GA cannot guarantee the absolute best solution in very large search spaces, it is expected to consistently provide a "good enough" solution.
Comparison BF vs GA
| Feature |
Brute Force (BF) |
Genetic Algorithm (GA) |
| Solution coverage |
All possible solutions |
Subset of promising solutions |
| Execution time |
Exponential with length |
Efficient in large search spaces |
| Recommended linker length |
Short (≤6–7 nt) |
Long (>7 nt) |
| FRE mutations |
Feasible only in small combinations |
Easily handled |
| Solution diversity |
High (all solutions) |
High if diversity maintained |
| Implementation complexity |
Simple |
Requires fitness function and GA operators |
CONCLUSION
BRUTE FORCE (BF)
Use for exhaustive searches in small spaces
GENETIC ALGORITHM (GA)
Use for large problems, long linkers, and/or with mutations, efficiently exploring diverse solutions
Visualization and Download of Results
Once the software has completed the search and clustering of designs, the results are presented clearly and interactively through the Streamlit interface. In addition to visualization on the platform, TADPOLE provides several options for downloading and exporting data, ensuring reproducibility and compatibility with other tools.
Compressed File (.zip)
For quick and complete access to all search data, TADPOLE generates a compressed file (.zip) that meticulously organizes all relevant files. This download package is essential for archiving your work results or sharing them with other researchers. The contents of the .zip file include:
Within the .zip, files are organized in separate folders, one for each cluster of structurally similar designs. This facilitates navigation and analysis of solution families.
For each design, you will find:
- Structure Images: High-quality visualizations of the predicted secondary structures for ON and OFF states.
- .ps Files: PostScript format files that allow high-resolution visualization and are compatible with bioinformatics tools like R-scape.
- Text Files (.txt): Text documents with the RNA sequence, folding energies (MFE and ΔMFE), and structures in dot-bracket notation for each design. This allows you to easily import the data into other scripts or databases.
Detailed HTML Report 💡
The HTML report can be viewed immediately or downloaded as an independent file, allowing users to quickly access a summary of the results. This document provides a comprehensive analysis of the search, designed to help you make an informed decision about which designs to take to the laboratory.
Metrics Analysis and Design Ranking
Rather than including graphs, the report presents the most promising designs in two ordered lists. This allows users who don't want to perform exhaustive analysis to quickly identify the best options. Designs are ordered as follows:
By Minimum Free Energy (ΔMFE)
Designs are listed from highest to lowest according to the closeness of their ΔMFE to the input value provided by the user. The closest design is considered the best in terms of stability.
By FRE-CRE Pairings
Designs are ordered from highest to lowest according to the closeness of the number of base pairs between the FRE and CRE to the input value. A good design will have a number of pairings close to the optimal value for the desired state.
Success Criteria
These two criteria are crucial for determining the success of an RNA switch. The best design will be one that combines adequate folding energy with the correct number of pairings.
Cluster Analysis
For each cluster of designs, the report shows only the representative. This representative is the design that has the minimum free energy (MFE) closest to the MFE of the structure you provided as input. This helps you understand the common structural solutions found and identify the most robust design families, choosing the best from each group.
Example Report
SBOL3 Format Export
TADPOLE offers the option to export designs in SBOL3 (Synthetic Biology Open Language), a crucial standard for interoperability in synthetic biology. This format allows standardized representation of designs, ensuring they can be shared and used in other tools.
Design Description
The SBOL3 file (.jsonld) contains a complete description of your RNA switch. This includes the complete RNA sequence as well as additional metadata in JSON format.
RNA Component (sbol3:Component)
Each design is exported as an RNA component. This component has a unique identifier and defined role, making it compatible with other systems that handle the SBOL standard.
Key Metadata
Within the JSON description, the following parameters are included for each design:
- structure_unconstrained: The dot-bracket structure of the OFF state.
- structure_constrained: The dot-bracket structure of the ON state.
- mfe_unconstrained: The minimum free energy of the OFF state.
- mfe_constrained: The minimum free energy of the ON state.
mutations: Information about any mutations performed in the SRE to optimize the design.
Interoperability and Collaboration
The ability to export in this format is essential for collaboration and reproducibility, allowing your designs to integrate seamlessly into the broader synthetic biology ecosystem.
This standardized format ensures that your RNA switch designs can be easily shared, validated, and reused across different research groups and computational platforms.
Example: How to Read the Exported Results in Your Own Project
>>> import sbol3
>>> doc = sbol3.Document()
>>> doc.read('/path/to/all_linkers.jsonld')
# Once the document is loaded, you can access both the sequences and the associated metadata
>>> for obj in doc.objects:
... print(obj.identity, obj.name)
... if hasattr(obj, "sequences"):
... for seq in obj.sequences:
... print(" Sequence:", seq.elements)
This way, you can not only reuse the generated constructs but also inspect their contextual information (names, descriptions, annotations, etc.), making it easier to integrate them into other projects or tools.
Principles for Selecting a Functional RNA Switch
Quick Start Guide
Get started with TADPOLE in 4 simple steps:
1
Add FRE Sequence
Input your functional RNA element sequence
2
Add CRE Sequence
Input your conformational RNA element
3
Add CRE Structure
The theoretical structure of your CRE element
4
Set Target Structure
Define the functional structure in dot-bracket notation
To construct a functional regulatory RNA switch, it is not enough to simply join an FRE and a CRE. A linker of defined length (typically 6–10 nucleotides) is essential to connect the two modules and control whether they influence each other.
The activation mechanism follows a ligand-dependent energy shift:
OFF State (unbound)
Without the ligand, the RNA adopts the most stable configuration, where the structural element is disrupted, preventing readthrough.
ON State (bound)
Ligand binding stabilizes the aptamer, restoring the functional structure of the structural element, thus allowing readthrough.
Main Design Criteria
The main design criteria for selecting a suitable linker are (See more information on the Model Page):
Energy difference
The OFF-state must initially be more stable than the ON-state by roughly half the ligand binding energy. Ligand binding then lowers the energy of the ON-state, making it the most stable configuration.
Limited inter-module pairings
Pairings between the aptamer and structural element in the unbound state must be limited, so the switch can properly transition between OFF and ON.
Accessible binding sites
Nucleotides forming the aptamer's entry and binding sites should be mostly free to allow ligand interaction.
Structural element disruption in OFF-state
Key nucleotides responsible for structural element function should be disrupted in the OFF state to ensure the switch is inactive.
Structural element preservation in ON-state
The same functional nucleotides should form their correct structure upon ligand binding.
Important Notes
All this is evaluated from the inputs of the user. The user needs to know necessarily: The sequence and structure (even if just predicted) of their FRE, and the functional parts of the structure (the key substructures for function).
In case the FRE's structure you aim to study is not well characterised, follow the next steps:
- Use RNAFold to predict the structure of your element.
- In order to identify the functional parts of your structure, this software includes an evolutionary analysis using a multiple sequence alignment (MSA) to help characterise conserved structural features.
Advanced Parameters
Finally, you must choose how TADPOLE explores the design space. Two options are available:
Brute Force
Systematically evaluates all possible combinations.
Best for small search spaces; guarantees that the optimal solution will be found.
Computationally expensive: becomes impractical for long linkers or when mutations are allowed.
Suitable only for simple systems where the search space is limited.
Genetic Algorithm (GA)
Uses evolutionary strategies to iteratively improve designs.
Scales well to larger, more complex problems (e.g., long linkers, designs with mutations).
Does not guarantee the absolute global optimum, but reliably finds high-quality solutions within reasonable time.
In short, Brute Force prioritizes completeness, while the Genetic Algorithm prioritizes efficiency. This flexibility allows TADPOLE to support both quick tests and ambitious, large-scale explorations.
In our experience, Brute Force works well for linkers shorter than 7 nucleotides. For longer linkers or when mutations are enabled, the computational cost becomes too high, making the Genetic Algorithm the more practical choice.
What it is
This parameter allows you to define which nucleotides in the FRE can mutate. The goal is to explore sequence variants without breaking the essential biological function of the RNA switch.
Mutations are allowed only in positions that do not compromise the functional RNA element. If a nucleotide is paired in a stem, its complementary partner is automatically adjusted to maintain the pairing — preserving the secondary structure and avoiding functional disruption.
How to use
- Define a list of mutable positions, numbered from the first nucleotide of the FRE sequence.
- Only the listed positions can change during the search; all other nucleotides remain fixed.
- If you specify a nucleotide that is part of a pair, its paired base will mutate accordingly to maintain complementarity and preserve the fold.
📋 Example: SECIS FRE
The functional core lies between nucleotides **10–63**. Therefore, only nucleotides outside this region (1–9 and those after 63) should be mutable.
Paired bases in stems are automatically co-mutated to preserve Watson–Crick pairing.
Functional Region:
10–63
Mutable Positions:
1–9
This setup ensures that the critical region remains intact, while the flanking nucleotides can introduce diversity without compromising function. The illustrative setup highlights:
- Watched (FRE): select a few apical-loop residues and both sides of key stem pairs within 10–63.
- Mutable (FRE): 1–9 (and >63 if present).
- Result: the core keeps its functional architecture; flanks supply sequence diversity.
Practical tips
- Start with a small set of mutable positions to limit the search space and reduce the risk of generating non-functional variants.
- Remember that mutability expands the search space exponentially; Brute Force is impractical when mutations are included, so use the Genetic Algorithm.
- Best practice: avoid putting watched indices themselves into the mutable list; watch them to preserve structure and functionality, mutate around them to explore diversity.
What it is
The linker is the sequence of nucleotides placed between the FRE and the CRE. Its length directly influences how the two elements can interact.
Short linker
Restricts flexibility, often forcing the CRE and FRE into rigid orientations.
Long linker
Introduces more conformational freedom, which expands the number of possible folds but also enlarges the search space.
Too long linker
Can cause the FRE to interact only with the linker itself, reducing effective FRE–CRE interactions.
TADPOLE treats linker length as a design parameter. In the Brute Force search, you can define a range to systematically explore. In the Genetic Algorithm (GA) search, linker length is optimized dynamically along with sequence variation.
How to use
Provide either:
- A single value (e.g., linker length = 7), or
- A range (e.g., 5–12), if using Brute Force.
When a range is given, Brute Force will systematically test each linker length in that interval.
The default range in TADPOLE is chosen based on literature as the most biologically plausible window.
📋 Example: SECIS + aptamer system
Tested Linker Lengths:
5–9 nucleotides
Brute Force Results:
No valid linker found for <7 nt
Genetic Algorithm (with mutations):
Functional linkers found in 5-9 nt range
Practical tips
- Be aware that search complexity grows rapidly with longer linkers, especially if mutations are enabled.
- For long linkers, Brute Force is not recommended — the GA is more efficient and practical.
- If the FRE is highly structured, avoid overly long linkers, as they may divert interactions away from the CRE.
What it is
This parameter defines how much your functional RNA (FRE) structure can vary during the design search. In other words, it limits TADPOLE's tolerance for changes relative to the target FRE structure. This is important to preserve biological function while exploring new sequences or conformations.
How it works
- The parameter is usually expressed as the maximum number of nucleotides whose state (paired vs unpaired) can differ from the target.
- If a design exceeds this number, it is automatically discarded.
- This allows exploration of variants with some flexibility without compromising functionality.
📋 Example: SECIS Tolerances
Not all nucleotides in the SECIS are critical for its function. This means that some parts of the structure can change without losing functionality. For example, theoretically, the nucleotides at the base of the SECIS must be paired. However, in our constructs, we used designs where the first few bases were paired with nearby flanking sequences rather than strictly within the SECIS itself. This demonstrates that certain regions can tolerate structural changes while preserving function.
Why it is important
Provides a balance between exploration and control:
Too restrictive
Very few possible solutions, search is over-constrained.
Too permissive
High risk of losing the FRE's biological function.
Balanced approach
Helps the genetic or brute-force search focus on plausible solutions.
How to use it
- Define the reference FRE structure (dot-bracket).
- Decide how many changes to allow, depending on how strict the structure needs to be for function.
- Enter that number in TADPOLE as the Maximum changes on the FRE structure.
Practical tips
- For very sensitive FREs: start with a low value (1–2) to ensure functionality.
- For FREs whose function depends only on certain parts of the structure, or for exploratory designs: allow up to 10 changes to increase diversity.
What it is
This parameter sets the maximum number of base-pair interactions that can form between the FRE and the CRE in the OFF state. It controls how strongly the FRE and CRE can bind to each other, which directly affects folding, stability, and the ability to switch back to the ON state.
How it works
- Expressed as a number.
- If a design produces more pairings than the limit, it is automatically discarded.
- The idea is that the OFF state should be stable, but not so stable that it prevents transition back to the ON state.
📋 Example: SECIS Pairing
Allowed Pairings:
Up to 10
Functionality:
Enough to disrupt the ON structure while still allowing recovery
Why it is important
- Avoids excessive binding: too many FRE–CRE pairings in the OFF state can "lock" the system, making it too hard to switch back ON.
- Keeps the search realistic: limiting the maximum interactions prevents an explosion of unfeasible designs dominated by over-stabilized OFF states.
How to use it
- Look at the length of your FRE and CRE.
- Decide a maximum number of pairings based on how strong you want the OFF state to be.
- A good rule of thumb is around 10% of the nucleotides in your FRE.
- Enter that number in TADPOLE as Maximum number of FRE-CRE pairings.
Practical tips
- If your ON state is fragile or hard to stabilize → set a low limit (few pairings).
- If your OFF state is too weak or unstable → allow more pairings.
- Adjust the value depending on whether you want the system to favor easier activation (ON) or stronger repression (OFF).
What it is
This parameter defines the minimum energy gap that must exist between the OFF and ON states of the FRE, ensuring a clear functional distinction between the two conformations. It is expressed in kcal/mol and provides a thermodynamic threshold that designs must satisfy to be considered valid.
There are three common cases:
- Ligand-aptamer designs: the ON state is stabilized by ligand binding to the CRE.
- miniRNA designs: the OFF state is stabilized by interactions with a small complementary RNA sequence.
- Intrinsic folding switches: the RNA shifts between ON and OFF conformations without an external binding partner.
The energy difference is calculated as the difference between the OFF and ON states. The system must satisfy the minimum energy threshold to ensure proper switching behavior.
Why it is important
- Ensures that the OFF state dominates in the absence of the CRE (or ligand).
- Guarantees that binding of the CRE (or ligand) is sufficient to flip the system into the ON state.
- Prevents ambiguous folding outcomes where neither conformation is clearly preferred.
How to use it
Choose the minimum energy difference based on the type of switch:
- Ligand/aptamer systems: typically 3–5 kcal/mol.
- miniRNA-based switches: smaller thresholds (1–2 kcal/mol) may be sufficient.
- Intrinsic folding switches: may require stronger differences (5+ kcal/mol).
Enter the chosen value into TADPOLE as the Minimum Energy Difference.
Practical tips
- Too low a value → OFF and ON states may coexist, leading to leaky function.
- Too high a value → CRE binding might not be sufficient to flip the switch.
- Moderate values (3–5 kcal/mol) often give the best balance between robustness and responsiveness.
The minimum energy difference ensures OFF state dominates without ligand binding, and ON state dominates once binding adds stabilizing energy.
References
- Lorenz, R., Bernhart, S. H., Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6, 26. https://doi.org/10.1186/1748-7188-6-26
- Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & De Hoon, M. J. L. (2009). Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163
- Rodnina, M. V., Korniy, N., Klimova, M., Karki, P., Peng, B. Z., Senyushkina, T., Belardinelli, R., Maracci, C., Wohlgemuth, I., & Samatova, E. (2019). Translational recoding: Canonical translation mechanisms reinterpreted. Nucleic Acids Research, 48(3), 1056–1067. https://doi.org/10.1093/nar/gkz783
- Jenison, R. D., Gill, S. C., Pardi, A., & Polisky, B. (1994). High-resolution molecular discrimination by RNA. Science, 263(5152), 1425–1429. https://doi.org/10.1126/science.7510417
- Berry, M. J., Banu, L., Harney, J. W., & Larsen, P. R. (1993). Functional characterization of the eukaryotic SECIS elements that direct selenocysteine insertion at UGA codons. EMBO Journal, 12(8), 3315–3322. https://doi.org/10.1002/j.1460-2075.1993.tb05983.x
- Mathews, D. H., Sabina, J., Zuker, M., & Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 288(5), 911–940. https://doi.org/10.1006/jmbi.1999.2700
- Holland, J. H. (1992). Adaptation in Natural and Artificial Systems. MIT Press.