CAPE Software
CAPE SOFTWARE
Click here to see more
>
Up

CAPE

Computational Assistant for Pathway Engineering

A user-friendly bioinformatics tool designed to discover and engineer metabolic pathways for biodegradation in Rhodococcus opacus PD630

What does CAPE do?


CAPE (Computational Assistant for Pathway Engineering) is a bioinformatics tool built to integrate with HERO (High-performance Engineered Rhodococcus Opacus). It operates on metabolic graphs created from the KEGG database and integrates predicted metabolic pathways from RetroPath2.0.

CAPE is designed to find biologically plausible enzymatic pathways between a source compound (e.g., a pollutant) and a desired product, allowing users to obtain host-optimized gene sequences needed to perform the reactions.

Automated Pathway Discovery

CAPE automates the traditionally manual process of searching metabolic databases and integrating reaction data, making pathway engineering accessible to teams without extensive bioinformatics resources.

Integration of Known & Predicted Reactions

Combines curated KEGG data with RetroPath2.0 predictions to identify complete degradation pathways, even when gaps exist in current databases.

Codon Optimization

Automatically optimizes retrieved enzyme sequences for expression in Rhodococcus opacus PD630, ensuring efficient protein production.

Assembly-Ready Output

Provides FASTA files, characterized iGEM parts, and restriction site screening compatible with standard cloning methods.

How to install CAPE


  1. Open a Unix shell

    Ensure you have terminal access on Linux, macOS, or WSL on Windows.

  2. Install conda and add to PATH

    Verify conda is installed and accessible from your terminal.

  3. Clone the GitLab repository
    git clone https://gitlab.igem.org/2025/software-tools/bologna
  4. Create the conda environment
    cd bologna
    conda env create -f environment.yml
  5. Unzip necessary files
    gunzip -k cape_app/algorithms/pathways/data/retrorules_rr02_rp2_flat_forward.csv.gz

    tar -xzf cape_app/algorithms/pathways/data/Pathways.tar.gz -C cape_app/algorithms/pathways/data/
  6. Install RetroPath2.0 in the environment

    See the RetroPath repository for more details.

    conda install -c conda-forge -n cape retropath2_wrapper
  7. Activate the environment
    conda activate cape
  8. Run the server
    python manage.py runserver

    In case this message is displayed: "You have 18 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions. Run 'python manage.py migrate' to apply them." After quitting the server with Ctrl+C, run python manage.py migrate as suggested.

    Access CAPE on a browser at: http://127.0.0.1:8000/

How does CAPE work?


CAPE Workflow

Workflow of the CAPE tool

Step 1: KEGG-based Metabolic Graph

CAPE constructs a metabolic graph of all reactions in the metabolism of Rhodococcus opacus PD630 using data parsed from the KEGG database:

  • Nodes represent compounds
  • Edges represent reactions catalyzed by enzymes
  • Edge weights encode biological plausibility—heavier edges correspond to reactions less likely to occur naturally

When a user inputs a source compound and desired product, CAPE searches for the shortest path (lowest total weight) through this graph, automating manual KEGG searches.

Step 2: Integration with RetroPath 2.0

If KEGG alone cannot connect the source to the product, CAPE integrates predictions from RetroPath2.0:

  • RetroPath uses generalized chemical rules to infer possible reaction steps from source compounds to core metabolites of R. opacus PD630
  • Predicted reactions are added as edges in the metabolic graph
  • CAPE searches again for the shortest path

This enables CAPE to propose novel, plausible pathways beyond those annotated in KEGG.

Step 3: Path Selection

CAPE returns candidate pathways, each showing:

  • Sequential compound reactions (steps)
  • Multiple enzyme options for each step, color-coded by reliability:
Green: Annotated in R. opacus PD630 Yellow: Annotated in another organism Red: RetroPath predicted, with EC Purple: RetroPath predicted, no EC

Step 4: EC Selection and NOEC Problem Resolution

Users can select an existing EC number for each reaction or define a custom one. When RetroPath-predicted reactions lack an EC annotation, CAPE uses SelenzymeRF to infer the missing EC:

  • Input: SMARTS representation of the chemical reaction
  • Output: Predictions of candidate enzymes and their EC numbers

Step 5: Ortholog Retrieval

CAPE queries the NCBI Protein database via the Entrez API to retrieve ortholog sequences:

  • Prioritizes Swiss-Prot reviewed entries from organisms closely related to R. opacus PD630
  • Expands searches to non-reviewed proteins if needed
  • Broadens taxonomic scope step-by-step up to the Bacteria level
  • Displays filterable tables of protein sequences for each EC
  • Supports custom .faa file uploads for experimentally characterized enzymes

Step 6: Codon Optimization

Sequences undergo codon optimization based on the Kazusa codon usage table for Rhodococcus opacus:

  • Max mode: Deterministic—always selects the most frequently used codon
  • Weighted mode: Probabilistic—samples codons according to usage frequency (accepts seed parameter for reproducibility)

Restriction site screening detects and removes illegal sites incompatible with cloning methods:

  • Type IIS RFC1000 (removes BsaI and SapI)
  • BioBrick RFC10 (removes EcoRI, XbaI, SpeI, PstI, and NotI)
  • Custom restriction site lists

Step 7: Output

CAPE returns:

  • .fna file of all enzyme sequences in the selected pathway
  • Characterized HERO Parts such as promoters and RBSs
  • pLoxship backbone sequence compatible with HERO for insertion into R. opacus PD630

Using the Web Interface


1

Input Compounds

Enter the source InChI and desired product InChI, optionally including their common names and the number of pathways to compute (maximum).

2

Select a Pathway

Review the returned pathways and select the most promising one based on the number of steps and reaction reliability.

3

Choose EC Numbers

For each enzymatic step, select an EC number from the color-coded options indicating reliability.

4

Select Enzyme Sequences

Choose from retrieved sequences or add custom sequences from your own experimental data.

5

Codon Optimization

Optimize protein sequences according to the Rhodococcus opacus genome, optionally excluding illegal restriction sites.

6

Download Results

Download FASTA files and iGEM parts characterized by the HERO wet-lab team, including the pLoxship backbone, promoters, and RBSs.

How were the edge weights chosen?


In the metabolic graph, each directed edge connects two compounds participating as reactant and product within the same reaction. The algorithm aims to identify biologically plausible degradation pathways following the transformation of a main compound through successive reactions.

If all edges had equal weights, the shortest paths would frequently include biologically implausible "shortcuts," such as traversing ubiquitous metabolites like water, yielding nonsensical routes (e.g., source → water → product). To address this, a biologically aware edge-weighting approach was implemented:

Annotation-Based Weighting

Reactions annotated in R. opacus PD630 receive the lowest weights (highest confidence). Reactions from other organisms get slightly higher weights, while RetroPath-predicted reactions receive progressively heavier weights.

Mass-Based Adjustment

Sharp drops in molecular weight are penalized, as such steps lead toward small fragments rather than the main degradation route. Abrupt increases are also discouraged.

Cofactor Blacklist

Ubiquitous cofactors (ATP, NADH, CoA) are assigned high weights to prevent overrepresentation in pathways.

Improved Results

This weighting strategy improved average biologically relevant path length from ~3 steps (dominated by shortcuts) to ~8 steps, corresponding to coherent, enzyme-mediated degradation sequences.

How can CAPE be improved in the future?


🧬

Strain Generalization

CAPE is currently tailored to R. opacus PD630, but future versions could support additional bacterial hosts, expanding its applicability across synthetic biology projects.

📊

Database Generalization

Integration with metabolic databases beyond KEGG could provide more comprehensive pathway coverage and alternative reaction routes.

🤖

Machine Learning Integration

Incorporating ML-driven pathway prediction could improve accuracy of predicted reactions, suggest novel biodegradation routes, and more accurately prioritize pathways based on metabolic feasibility.

📈

Enhanced Visualization

Interactive pathway diagrams with clickable nodes and edges could help users explore enzyme details and reaction steps more intuitively.

⚠️

Toxicity Awareness

Future versions could warn users about potential harmful compounds generated along predicted pathways, helping to design safer biodegradation strategies.

🔄

RetroRules Update

Using updated RetroRules might yield improved results, facilitating the prediction of lesser-known degradation pathways.

Ready to engineer your biodegradation pathway? Visit our GitLab repository to get started with CAPE!

Innovation never stops! This page represents the original bulk of the code, for further developments visit: CAPE GitHub repository

References

  • Delépine, B., Duigou, T., Carbonell, P., & Faulon, J. L. (2018). RetroPath2. 0: a retrosynthesis workflow for metabolic engineers. Metabolic engineering, 45, 158-170.
  • Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1), 27-30.
  • Koch, M., Duigou, T., & Faulon, J. L. (2020). Reinforcement learning for bioretrosynthesis. ACS synthetic biology, 9(1), 157-168.
  • Nakamura, Y., Gojobori, T., & Ikemura, T. (2000). Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic acids research, 28(1), 292-292.
  • Roell, G. W., Schenk, C., Anthony, W. E., Carr, R. R., Ponukumati, A., Kim, J., ... & García Martín, H. (2023). A high-quality genome-scale model for Rhodococcus opacus metabolism. ACS synthetic biology, 12(6), 1632-1644.