Software

Overview

When we were designing our TRI-LYTAC constructs, we discovered that tri-domain fusion design is very time consuming and inefficient, with judging tools manually to find linkers, format conversions taking 60% of the time, and very slow predictions for each fusion design. Therefore, in order to simplify our design/build process and increase efficiency, we decided to find a solution to this problem. This solution would be a linker finder, designed powered by Biopython, simple n8n automation, and auto CollabFold prediction and download. Our software asks for sequences (text) for each of the domains (A,B, and C), and outputs a combined FASTA (from LinkerFinder, one of our software’s modules), as well as ColabFold predicted structures. We hope that this software will allow future iGEM teams and researchers to optimize their multi-domain fusion design, allowing for increased accessibility of LYTACs, PROTACs, and other bispecific chimeras.

Diagram-linkerflow

Synoptic representation of the steps involved in the LinkerFlow pipeline

      • Reduces manual input/output operations by 70–90% compared to traditional workflows
      • Allows batch testing of multiple linker variants in parallel (on capable hardware)
      • Local-first: no cloud lock-in; all modules can run offline if needed
      • Supports both Google Colab and LocalColabFold instances for flexibility

How It Works

The pipeline starts with LinkerScout, a lightweight Streamlit application for designing fusion linkers. The user pastes their three domain sequences (A, B, and C) directly from Benchling or another sequence editor. They can then modify key parameters such as:
                 o pH of the environment
                 o Preferred linker type for each junction and length range
                 o charsetmotif exclusion tick-boxes (can avoid N-glycosylation motifs (NXS/T)), furin-like sites (RXXR, RXKR) and cysteines in linkers

After clicking Design TRI-TAC Fusion/Find Linkers, LinkerScout outputs a FASTA file with a complete construct (A-linker-B-linker-C).

• Screenshots:
• Input area

• Parameter controls

• Screenshot of output

Next, the FASTA sequence is processed through the LinkerFlow automation pipeline built in n8n. This workflow connects all necessary steps in sequence:

• Node chain:

o Run Linkerscout → Paste endpoints → Design TRI TAC Fusion → Extract text from FASTA → Submit to ColabFold (Microsoft Playwright MCP AI Agent) → Wait → Download zip

The workflow includes separate calls for online ColabFold (Google Colab) and LocalColabFold (for offline GPU-based prediction on higher end systems).

• Screenshot: Workflow

The ColabFold run produces standard AlphaFold2/MMSeq2 outputs, including .pdb models, ranking JSONs, pLDDT confidence plots, and multiple-sequence alignments (.a3m). Each output set includes a thumbnail preview of the top model and the highest pLDDT score inside the .zip file, as seen in the screenshot below.

Note: In this example, TRI TAC is used

• Screenshot: output

• Screenshot: example pLDDT

Features

      • End-to-end automation (design → prediction →output in a seamless manner)
      • Linker parameterization (length/class/motif filters)
      • Variant management (IDs, logs, reproducible seeds)
      • Local-first; supports larger proteins through the use of LocalColabFold (requires a powerful computer)
      • Modular (swap in other predictors or scoring nodes)
      • Output standardization (.pdb, .zip, report stub)

Getting Started

• Prerequisites:

                o Python ≥3.10, pip/conda
                 o Streamlit, BioPython, Numpy, Pandas
                o n8n (recommended local, running on docker)
                o ColabFold (local or Colab notebook)

❗ Note: The N8N workflow will automatically install streamlit, biopython, numpy and pandas.

      1.Run n8n locally (via Docker or server).
       2. Import linkerflow.json into n8n and set environment variables.
       3.Configure the command node to the location of linkerscout_streamlit.py; it will auto-install dependencies.
       4.Specify your ColabFold endpoint (Google Colab URL or local script path).
       5.For LocalColabFold, WSL2 or Linux is required; install Conda and dependencies following the official GitHub guide.
Adjust the “Run Command” node to fit your ColabFold setup, ensuring the proper Conda environment is active.

Use any TRI-LYTAC A | B | C domains as a quick test. You’ll know the setup works when:

      • LinkerScout exports a valid combined FASTA
       • n8n logs show job submission and successful zip download
      • The /outputs folder contains .zip and .pdb files

⚠️ Note: n8n and Playwright MCP are new platforms; minor errors may occur. Always monitor the n8n workflow logs during use, to catch errors as soon as they occur.

Usage (common tasks)

      • Single design:
                o Paste A/B/C → choose linkers → export → run n8n → open .pdb
       • Batch designs (not tested since powerful enough computers were not available):
                 o Enable parallel run in n8n (concurrency parameter)
                 o Duplicate the workflow as many times as needed
                 o (For LocalColabFold) Run multiple Conda environments to have more ColabFold instances
                 o Configure the n8n nodes again and add delay between workflows to allow LinkerScout to process all inputs
       • Swap to local ColabFold:
                 o Toggle endpoint on n8n → path to local script + GPU is detected by LocalColabFold
       • Export:
                 o FASTA file from LinkerScout
                 o Zip file with pLDDT and top variants from ColabFold

Reproductibility & Transparency

      • Easy to update all dependencies (streamlit, n8n, ColabFold)
       • The Streamlit app is lightweight and can be easily modified like any other Streamlit app, in a singular python file.
       • Includes a lightweight example dataset from Biopython for linkers
       • Easy to import and modify .json file for n8n to expand uses beyond just LinkerFlow

Validation & Benchmarks

We benchmarked LinkerFlow using the TRI-TAC A|B|C domains as our main project.

      • Average time-to-result: ~20 minutes (versus ~40 minutes manually)
       • Success rate: 75 % (three out of four test runs completed without errors)
       • Average RAM usage: 11 GB (Colab); 22.5 GB (LocalColabFold)
       • Hardware used:
                 o Google Colab setup: Intel i7-1065G7, 16 GB RAM, 512 GB SSD (iGPU)
                 o LocalColabFold (inconclusive tests): AMD Ryzen 5 5600G, 24 GB RAM, RX580 GPU (unsupported for CUDA)

Although our hardware limitations prevented large-scale benchmarking, our results demonstrated a significant time gain even on modest systems.

Impact

LinkerFlow accelerates the DBTL (design-build-test-learn) cycle for synthetic fusion proteins.

It lowers the entry barrier for non-experts to evaluate the structural feasibility of the constructed. This was our main goal.

      • Lowers barrier for non-experts to evaluate structural feasibility (our main goal)
      • Reusable by other iGEM teams and researchers (bispecifics, sensors, logic-fusions)
      • Education value: transparent pipeline shows each stage
      • Promote the use of new technologies such as n8n, Cursor, MCP servers

      • Parallel batch dashboard:
                 o Variant table, status chips, ETA, quick open buttons
                 o Asset: batch-dashboard-mockup.webp
      • AI interpretation agent (cautious use):
                 o MCP connector to AlphaFold DB for similarity hints
                 o Guardrails: confidence thresholds, provenance links
      • Design space exploration:
                 o Linker enumeration strategies (library presets for linkers)
                 o Multi-objective scoring (length vs. predicted confidence)
                 o Linker visualization in LinkerScout

Other Technical Info

Architecture

      • Components:
                 o Streamlit + BioPython (LinkerScout)
                 o n8n (pipeline; file IO; HTTP/API)
                 o ColabFold (AF2/MMSeq2 support; local/Colab)
       • AData flow: A|B|C Domains (input) → FASTA (assembled) → job payload → .zip → .pdb/JSON
       • Config: Streamlit python file + n8n JSON file
       • Extensibility:
                 o Replace ColabFold node with alternative predictors
                 o Add scoring node for model ranking beyond pLDDT

Interoperability

• Benchling/FASTA round-trip (copy/paste + file), eventually extended into Benchling MCP
• Outputs consumable in PyMOL, ChimeraX, VMD

Security / Privacy

      • Local-first design: sequences stay on your machine
       • If using Colab: disclose remote processing; suggest anonymized IDs
       • No telemetry

Limitations / Red-Flags

      • Colab quotas → HIGH variability in runtimes
      • Model quality ≠ experimental truth; requires wet-lab validation (usual with AlphaFold)
       • Linker heuristics are general; project-specific constraints may be needed
      • Batch mode requires strong hardware for sane throughput

Documentation & Availability

      • Software is available by direct request to the iGEM repository
                 o LinkerScout: /software/linkerscout_streamlit.py (GitHub)
                 o n8n Workflow JSON: /software/linkerflow.json
                 o License: MIT
      • Citation block (template):
                 o VIS-Romania (2025). LinkerFlow: Automated linker & structure pipeline for TRI-LYTACs. iGEM Software. (iGEM registry link).

LinkerFlow: Automated Biopython Linker Discovery
& Structural Prediction for TRI-LYTACs

From 3 domains to 3D structure in minutes

Generate linker variants, assemble TRI-LYTACs, run ColabFold, and collect outputs automatically.

Open-Source (MIT License) • Streamlit • Biopython / Python • n8n • ColabFold / AlphaFold / MMSeq2